cadvisor

Author	SHA1	Message	Date
Davanum Srinivas	5f8eea99dd	Skip getVfsStats when file does not exist There are a lot of spurious exceptions in the kubernetes kubelet logs like: E1018 21:03:09.616581 22780 fs.go:332] Stat fs failed. Error: no such file or directory Since we know that calling syscall.Statfs will just fail when the path does not exist, we should just skip making the call. NOTE: fixing 2017->2018 problems in build by running `./build/jenkins_e2e.sh`	2018-01-02 17:52:38 -05:00
David Ashpole	9ffa37396f	Merge pull request #1806 from sentinelt/master fix #1607; use container creation time provided by Docker handler	2017-12-20 11:25:35 -08:00
David Ashpole	1e567c2ac3	Merge pull request #1835 from dashpole/changelog_0.28.3 changelog for v0.28.3	2017-12-07 09:52:08 -08:00
David Ashpole	e6b4e4c38a	changelog for v0.28.3	2017-12-07 09:46:35 -08:00
David Ashpole	fc6d4b920c	Merge pull request #1830 from jsravn/add-docker-timeouts Add timeouts for docker queries	2017-12-07 09:37:53 -08:00
James Ravn	57e17d8be2	Add timeouts for docker queries As these can otherwise block indefinitely due to docker issues. This is to fix https://github.com/kubernetes/kubernetes/issues/53207, where kubelet relies on cadvisor for gathering docker information as part of its periodic node status update.	2017-12-05 13:50:48 +00:00
David Ashpole	0bde1c615c	Merge pull request #1831 from brian-brazil/prometheus-labels Ensure all Prometheus metrics have the same labelnames.	2017-11-30 09:54:44 -08:00
Brian Brazil	27f103b266	Ensure all Prometheus metrics have the same labelnames. Fixes #1704	2017-11-30 16:33:37 +00:00
David Ashpole	7d11f4243f	Merge pull request #1827 from tallclair/logging Clean up cAdvisor logging	2017-11-29 10:16:58 -08:00
David Ashpole	b26bf6ebb2	Merge pull request #1826 from mindprince/gpu-docs Add docs for using nvidia gpu monitoring.	2017-11-28 17:49:29 -08:00
Tim Allclair	1eb1355ae6	Default logging to V(2)	2017-11-27 19:49:49 -08:00
Tim Allclair	5b435b4b70	Clean up cAdvisor logging	2017-11-27 19:48:05 -08:00
Tim Allclair	3a40bbfc5c	Raise verbosity on runtime registration failure	2017-11-27 19:48:04 -08:00
Rohit Agarwal	6ba3fa4e8c	Add docs for using nvidia gpu monitoring.	2017-11-27 17:43:14 -08:00
David Ashpole	49440c7e0a	Merge pull request #1818 from dashpole/changelog changelog for v0.28.2	2017-11-21 16:32:31 -08:00
David Ashpole	9689d84e7f	changelog for v0.28.2	2017-11-21 16:27:22 -08:00
David Ashpole	e420065e7d	Merge pull request #1817 from dashpole/util_clock Switch from apimachinery clock to k8s.io/utils/clock	2017-11-21 16:24:51 -08:00
David Ashpole	3166cdae87	add utils/clock dependency	2017-11-21 16:19:57 -08:00
David Ashpole	3a347ec3fe	Revert "add apimachinery clock dependency" This reverts commit `fd43dc16ba`.	2017-11-21 14:21:47 -08:00
David Ashpole	5831d72df8	Merge pull request #1814 from mindprince/accelerator-data-race Avoid race in accessing nvidiaDevices between Setup() and GetCollector()	2017-11-21 14:03:18 -08:00
Rohit Agarwal	3c3845e92f	Avoid race in accessing nvidiaDevices between Setup() and GetCollector()	2017-11-21 13:53:47 -08:00
David Ashpole	7cb3faad02	Merge pull request #1811 from dashpole/changelog_0_28_1 changelog for v0.28.1	2017-11-20 15:13:49 -08:00
David Ashpole	1cd2620be6	changelog for v0.28.1	2017-11-20 15:08:11 -08:00
David Ashpole	17dcf1ca98	Merge pull request #1779 from dashpole/on_demand_metrics On-Demand container metrics	2017-11-20 15:06:26 -08:00
David Ashpole	3d6ad6dd86	on demand metrics	2017-11-20 14:51:04 -08:00
David Ashpole	fd43dc16ba	add apimachinery clock dependency	2017-11-20 13:15:15 -08:00
David Ashpole	ece1334172	update testify dependency	2017-11-17 16:15:28 -08:00
David Ashpole	a27bed7b9d	Merge pull request #1807 from dashpole/revert_1760 Revert "fix #1708; move from inotify to fsnotify"	2017-11-17 15:03:42 -08:00
David Ashpole	577f63f3da	Merge pull request #1808 from dashpole/update_ui Update jquery and bootstrap dependencies	2017-11-17 14:42:58 -08:00
David Ashpole	ee8cbf1054	update jquery and bootstrap	2017-11-17 13:17:51 -08:00
David Ashpole	6988e70a3d	Revert "fix #1708 ; move from inotify to fsnotify" This reverts commit `e6b6a1ac57`.	2017-11-17 10:28:28 -08:00
David Ashpole	5231853e71	Merge pull request #1805 from andyxning/marshal_device_name_to_json_output marshal device name to json output	2017-11-15 16:36:04 -08:00
Sławek Piotrowski	2648be083a	fix #1607 ; use container creation time provided by Docker handler	2017-11-16 00:41:10 +01:00
David Ashpole	4466b4d9a0	Merge pull request #1795 from abhi/containerd Containerd cadvisor integration	2017-11-15 10:24:53 -08:00
Andy Xie	9211091bdc	marshal device name to json output	2017-11-15 13:45:57 +08:00
abhi	3495851c7a	Updating godeps Signed-off-by: abhi <abhi@docker.com>	2017-11-14 17:37:50 -08:00
abhi	6ad15431f4	Integrating containerd to cadvisor This commit includes changes to integrate containerd runtime to cadvisor to collect container stats Signed-off-by: abhi <abhi@docker.com> Test cases and minor changes This commit include test cases and minor fixes for the same Signed-off-by: abhi <abhi@docker.com>	2017-11-14 17:37:36 -08:00
Derek Carr	49e0496c8f	Merge pull request #1773 from runcom/crio-socket-edit container: crio: change crio socket	2017-11-14 11:18:15 -05:00
David Ashpole	c99f2418fa	Merge pull request #1801 from mindprince/update-gonvml Update gonvml to workaround bazelbuild/rules_go#1003.	2017-11-10 15:34:33 -08:00
Rohit Agarwal	e1b4d79992	Update gonvml to workaround bazelbuild/rules_go#1003 .	2017-11-10 15:06:44 -08:00
David Ashpole	e2c25110e0	Merge pull request #1799 from abhi/grpc Updating grpc version to v1.3.0	2017-11-10 10:39:38 -08:00
abhi	e5d8730f4a	Updating grpc version to v1.3.0 This commit includes godep change to update the grpc version and also updates rkt version to v1.25.0. A minor change has been made in the client based on how rkt client is used in kubernetes/kubernetes. Signed-off-by: abhi <abhi@docker.com>	2017-11-09 17:32:18 -08:00
David Ashpole	c3090a95c7	Merge pull request #1794 from kant/patch-1 Formatting change for documentation	2017-11-07 09:25:15 -08:00
Darío Hereñú	9d7cd18598	Minor proposal	2017-11-07 00:19:17 -03:00
Euan Kemp	1ecd24ea8d	libcontainer: Use first cgroup subsystem found (#1792 ) libcontainer: Use first cgroup subsystem found	2017-11-06 15:33:59 -08:00
David Ashpole	3d2e7fcfa3	Merge pull request #1791 from dashpole/changelog_v0.28.0 v0.28.0 changelog	2017-11-06 15:05:29 -08:00
David Ashpole	1f3f49af11	v0.28.0 changelog	2017-11-06 15:00:10 -08:00
David Ashpole	9bc6590461	Merge pull request #1762 from mindprince/gpu-metrics-1436 Add per container GPU metrics	2017-11-06 12:34:13 -08:00
Rohit Agarwal	4a35130019	Collect container-level GPU metrics using NVML. When cAdvisor starts up, it would read the `vendor` files in `/sys/bus/pci/devices/*` to see if any NVIDIA devices (vendor ID: 0x10de) are attached to the node. If no NVIDIA devices are found, this code path would become dormant for the rest of cAdvisor lifetime. If NVIDIA devices are found, we would start a goroutine that would check for the presence of NVML by trying to dynamically load it at regular intervals. We need to do this regular checking instead of doing it just once because it may happen that cAdvisor is started before the NVIDIA drivers and NVML are installed. Once the NVML dynamic loading succeeds, we would use NVML’s query methods to find out how many devices exist on the node and create a map from their minor numbers to their handles and cache that map. The goroutine would exit at this point. If we detected the presence of NVML in the previous step, whenever a new container is detected by cAdvisor, cAdvisor would read the `devices.list` file from the container's devices cgroup. The `devices.list` file lists the major:minor number of all the devices that the container is allowed to access. If we find any device with major number 195 (which is the major number assigned to NVIDIA devices), we would cache the list of corresponding minor numbers for that container. During every housekeeping operation, in addition to collecting all the existing metrics, we will use the cached NVIDIA device minor numbers and the map from minor numbers to device handles to get metrics for GPU devices attached to the container.	2017-11-06 11:54:59 -08:00
Rohit Agarwal	318f28bef6	Vendor Go bindings for NVML. Don't build a static binary. We can't build a static binary because that would require bundling the closed source NVML library in cAdvisor. Instead, gonvml uses dlopen to dynamically load NVML.	2017-11-01 14:41:35 -07:00

1 2 3 4 5 ...

2287 Commits