Skip to content

zoneinfo: 'per-node stats' sections seem to confuse the parser #386

@knweiss

Description

@knweiss

On RHEL 8.3 and 8.4 kernels there seems to be a parsing issue in node-exporter 1.1.2's zoneinfo collector which is based on procfs.

Example:

# uname -a
Linux rhel83 4.18.0-240.15.1.el8_3.x86_64 #1 SMP Wed Feb 3 03:12:15 EST 2021 x86_64 x86_64 x86_64 GNU/Linux
# grep -E '(managed|^Node)' /proc/zoneinfo 
Node 0, zone      DMA
        managed  3840
Node 0, zone    DMA32
        managed  580234
Node 0, zone   Normal
        managed  45882525
Node 0, zone  Movable
        managed  0
Node 0, zone   Device
        managed  0
Node 1, zone      DMA
        managed  0
Node 1, zone    DMA32
        managed  0
Node 1, zone   Normal
        managed  46688852
Node 1, zone  Movable
        managed  0
Node 1, zone   Device
        managed  0
# curl -o metrics http://localhost:9100/metrics
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  151k    0  151k    0     0  3360k      0 --:--:-- --:--:-- --:--:-- 3360k
# grep node_zoneinfo_managed metrics
# HELP node_zoneinfo_managed_pages Present pages managed by the buddy system
# TYPE node_zoneinfo_managed_pages gauge
node_zoneinfo_managed_pages{node="0",zone=""} 3840
node_zoneinfo_managed_pages{node="0",zone="DMA32"} 580234
node_zoneinfo_managed_pages{node="0",zone="Device"} 0
node_zoneinfo_managed_pages{node="0",zone="Movable"} 0
node_zoneinfo_managed_pages{node="0",zone="Normal"} 4.5882525e+07
node_zoneinfo_managed_pages{node="1",zone=""} 4.6688852e+07
node_zoneinfo_managed_pages{node="1",zone="DMA"} 0
node_zoneinfo_managed_pages{node="1",zone="DMA32"} 0
node_zoneinfo_managed_pages{node="1",zone="Device"} 0
node_zoneinfo_managed_pages{node="1",zone="Movable"} 0

Notice, there's not even a zone="Normal" label for node 1 or a zone="DMA" label for node 0!

From a quick look I suspect this is caused by the "per-node stats" lines of /proc/zoneinfo. The parser resets zoneinfoElement.Zone when it sees such a line (the following numbers are from a different run):

[...]
Node 0, zone   Normal
  pages free     47098578
        min      2448054
        low      3060067
        high     3672080
        spanned  49545216
        present  49545216
        managed  46373028
        protection: (0, 0, 0, 0, 0)
[...]
Node 1, zone   Normal
  per-node stats                                                    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      nr_inactive_anon 1947
[...]
      nr_kernel_misc_reclaimable 0
  pages free     48669597
        min      2464943
        low      3081178
        high     3697413
        spanned  50331648
        present  50331648
        managed  46692949
        protection: (0, 0, 0, 0, 0)
func parseZoneinfo(zoneinfoData []byte) ([]Zoneinfo, error) {

        zoneinfo := []Zoneinfo{}

        zoneinfoBlocks := bytes.Split(zoneinfoData, []byte("\nNode"))
        for _, block := range zoneinfoBlocks {
                var zoneinfoElement Zoneinfo
                lines := strings.Split(string(block), "\n")
                for _, line := range lines {

                        if nodeZone := nodeZoneRE.FindStringSubmatch(line); nodeZone != nil {
                                zoneinfoElement.Node = nodeZone[1]
                                zoneinfoElement.Zone = nodeZone[2]
                                continue
                        }
                        if strings.HasPrefix(strings.TrimSpace(line), "per-node stats") {
                                zoneinfoElement.Zone = ""
                                continue
                        }

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions