On Arm platform, no 'core id' and 'physical id' in '/proc/cpuinfo'.
So we should search sysfs cpu path directly to get the data of
'thread_id' &'core_id' & 'node_id'.
This method can also be used on other platforms, such as x86, ppc64le...
/sys/bus/cpu/devices/cpu%d contains the information of 'core_id' & 'node_id'.
Such as:
cat /sys/bus/cpu/devices/cpu0/topology/core_id
ls /sys/bus/cpu/devices/cpu0/node0
Signed-off-by: Bin Lu <bin.lu@arm.com>
when cadvisor exports metrics for docker containers, there is a root cgroup (/) and cgroup for a docker container (/docker/uuid).
If docker container has a label on it, then this label is applied to all containers including the root container.
Because some containers don't have that label, the label will have an empty value. The reason for this is that Prometheus
does not allow sending a metric with the same name, but different labels, so cadvisor uses empty label values based on
the set of all labels for a given metric. This can result in many docker containers getting a large number of empty labels
because another container has that label.
If large number of docker labels vary a lot across images, then the set of labels will be enormous, where most of the labels
will be empty and have no value as prometheus metrics. To avoid this problem, a flag is provided that allows a user to
disable exporting docker labels as metrics.
Earlier if the NVIDIA driver was not installed when cAdvisor was started
we would start a goroutine to try to initialize NVML every minute.
This resulted in a race. We can have a situation where:
- goroutine tries to initialize NVML but fails. So, it sleeps for a minute.
- the driver is installed.
- a container that uses NVIDIA devices is started.
This container would not get GPU stats because a minute has not passed
since the last failed initialization attempt and so NVML is not
initialized.
GetSpec() can be called concurrently in
manager/container.go.updateSpec()
results into a concurrent map access on the labels map because we're
directly updating the map inside GetSpec(). The labels map from the
container handler is not a copy of the map itself, just a reference,
that's why we're getting the concurrent map access.
Fix this by moving the label update with restartcount to the handler's
initialization method which is not called concurrently.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>