Skip to content

Commit 259f9e1

Browse files
authored
chore: reduce number of DCGM metric labels (#489)
* chore: reduce the number of DCGM metric labels Signed-off-by: Deezzir <[email protected]> --------- Signed-off-by: Deezzir <[email protected]>
1 parent 08f7c80 commit 259f9e1

File tree

1 file changed

+20
-25
lines changed

1 file changed

+20
-25
lines changed

src/gpu_metrics/dcgm_metrics.csv

Lines changed: 20 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,26 @@ DCGM_FI_PROF_PIPE_FP16_ACTIVE, gauge, Ratio of cycles the fp16 pipes are activ
121121
DCGM_FI_PROF_PCIE_TX_BYTES, gauge, The rate of data transmitted over the PCIe bus - including both protocol headers and data payloads - in bytes per second.
122122
DCGM_FI_PROF_PCIE_RX_BYTES, gauge, The rate of data received over the PCIe bus - including both protocol headers and data payloads - in bytes per second.
123123

124+
# Features and modes
125+
DCGM_FI_DEV_COMPUTE_MODE, gauge, Compute mode
126+
DCGM_FI_DEV_PERSISTENCE_MODE, gauge, Persistance mode (1 or 0)
127+
DCGM_FI_DEV_CC_MODE, gauge, ConfidentialCompute/AmpereProtectedMemory status (1 or 0)
128+
DCGM_FI_DEV_ECC_CURRENT, gauge, Current ECC mode
129+
DCGM_FI_DEV_VIRTUAL_MODE, gauge, Virtualization mode
130+
DCGM_FI_DEV_AUTOBOOST, gauge, Auto-boost enabled
131+
DCGM_FI_DEV_BAR1_TOTAL, gauge, Total BAR1 (in MB)
132+
DCGM_FI_DEV_MAX_SM_CLOCK, gauge, Maximum supported SM clock
133+
DCGM_FI_DEV_MAX_MEM_CLOCK, gauge, Maximum supported Memory clock
134+
DCGM_FI_DEV_GPU_MAX_OP_TEMP, gauge, Maximum operating temperature
135+
DCGM_FI_DEV_SLOWDOWN_TEMP, gauge, Slowdown temperature
136+
DCGM_FI_DEV_SHUTDOWN_TEMP, gauge, Shutdown temperature
137+
DCGM_FI_DEV_POWER_MGMT_LIMIT, gauge, Current Power limit
138+
DCGM_FI_DEV_POWER_MGMT_LIMIT_MIN, gauge, Minimum Power limit
139+
DCGM_FI_DEV_POWER_MGMT_LIMIT_MAX, gauge, Maximum Power limit
140+
DCGM_FI_DEV_ENFORCED_POWER_LIMIT, gauge, Effective Power limit that the driver enforces after taking into account all limiters
141+
DCGM_FI_DEV_FB_TOTAL, gauge, Total Frame buffer (in MB)
142+
DCGM_FI_DEV_COUNT, gauge, Number of devices on the node
143+
124144
# Static configuration information and features
125145
DCGM_FI_NVML_VERSION, label, NVML Version
126146
DCGM_FI_DEV_BRAND, label, Device Brand
@@ -133,28 +153,3 @@ DCGM_FI_DEV_ECC_INFOROM_VER, label, ECC inforom version
133153
DCGM_FI_DEV_POWER_INFOROM_VER, label, Power management object inforom version
134154
DCGM_FI_DEV_INFOROM_IMAGE_VER, label, Inforom image version
135155
DCGM_FI_DEV_VBIOS_VERSION, label, VBIOS version of the device
136-
137-
DCGM_FI_DEV_COMPUTE_MODE, label, Compute mode
138-
DCGM_FI_DEV_PERSISTENCE_MODE, label, Persistance mode (1 or 0)
139-
DCGM_FI_DEV_CC_MODE, label, ConfidentialCompute/AmpereProtectedMemory status (1 or 0)
140-
DCGM_FI_DEV_ECC_CURRENT, label, Current ECC mode
141-
DCGM_FI_DEV_VIRTUAL_MODE, label, Virtualization mode
142-
DCGM_FI_DEV_AUTOBOOST, label, Auto-boost enabled
143-
144-
DCGM_FI_DEV_BAR1_TOTAL, label, Total BAR1 (in MB)
145-
146-
DCGM_FI_DEV_MAX_SM_CLOCK, label, Maximum supported SM clock
147-
DCGM_FI_DEV_MAX_MEM_CLOCK, label, Maximum supported Memory clock
148-
149-
DCGM_FI_DEV_GPU_MAX_OP_TEMP, label, Maximum operating temperature
150-
DCGM_FI_DEV_SLOWDOWN_TEMP, label, Slowdown temperature
151-
DCGM_FI_DEV_SHUTDOWN_TEMP, label, Shutdown temperature
152-
153-
DCGM_FI_DEV_POWER_MGMT_LIMIT, label, Current Power limit
154-
DCGM_FI_DEV_POWER_MGMT_LIMIT_MIN, label, Minimum Power limit
155-
DCGM_FI_DEV_POWER_MGMT_LIMIT_MAX, label, Maximum Power limit
156-
DCGM_FI_DEV_ENFORCED_POWER_LIMIT, label, Effective Power limit that the driver enforces after taking into account all limiters
157-
158-
DCGM_FI_DEV_FB_TOTAL, label, Total Frame buffer (in MB)
159-
160-
DCGM_FI_DEV_COUNT, label, Number of devices on the node

0 commit comments

Comments
 (0)