Skip to content

Commit b5522cc

Browse files
authored
Merge amd-staging into amd-mainline 20251117 (#838)
Signed-off-by: Maisam Arif <[email protected]>
2 parents 4ff9527 + a044536 commit b5522cc

File tree

17 files changed

+175
-49
lines changed

17 files changed

+175
-49
lines changed

CHANGELOG.md

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -67,16 +67,29 @@ Full documentation for amd_smi_lib is available at [https://rocm.docs.amd.com/pr
6767

6868
### Changed
6969

70-
- **`amd-smi set --power-cap` now requires specification of the power cap type**.
71-
- Command now takes the form: `amd-smi set --power-cap <power-cap-type> <new-cap>`. Acceptable power cap types are "ppt0" and "ppt1".
72-
Ex.
70+
- **The `amd-smi` command now shows hsmp rather than amd_hsmp**.
71+
- The hsmp driver version can be shown without the amdgpu version using `amd-smi version -c`
72+
73+
```console
74+
$ amd-smi version
75+
AMDSMI Tool: 24.7.1+b446d6c-dirty | AMDSMI Library version: 24.7.2.0 | ROCm version: N/A | amdgpu version: 6.10.10 | hsmp version: 2.2
76+
77+
$ amd-smi version -c
78+
AMDSMI Tool: 24.7.1+b446d6c-dirty | AMDSMI Library version: 24.7.2.0 | ROCm version: N/A | hsmp version: 2.2
79+
...
80+
```
81+
82+
- **`amd-smi set --power-cap` now requires sepcification of the power cap type**.
83+
- Command now takes the form: `amd-smi set --power-cap <power-cap-type> <new-cap>`
84+
- Acceptable power cap types are "ppt0" and "ppt1"
85+
86+
```console
87+
$ sudo amd-smi set --power-cap ppt1 1150
88+
GPU: 0
89+
POWERCAP: Successfully set ppt1 power cap to 1150W
90+
...
91+
```
7392

74-
```console
75-
$ sudo amd-smi set --power-cap ppt1 1150
76-
GPU: 0
77-
POWERCAP: Successfully set PPT1 power cap to 1150W
78-
...
79-
```
8093
- **`amd-smi reset --power-cap` will attempt to reset both power caps**.
8194
- When using the reset command, both PPT0 and PPT1 power caps will be reset to their default values. If a device only has PPT0, then only PPT0 will be reset.
8295
Ex.
@@ -1402,7 +1415,7 @@ Functions affected by struct change are:
14021415
- **Corrected CLI CPU argument name**.
14031416
- `--cpu-pwr-svi-telemtry-rails` to `--cpu-pwr-svi-telemetry-rails`
14041417

1405-
- **Added amdgpu driver version and amd_hsmp driver version to `amd-smi version` command**.
1418+
- **Added amdgpu driver version and amd_hsmp driver version to `amd-smi version` command**.
14061419
- The `amd-smi version` command can now also display the amdgpu driver version using the `-g` flag.
14071420
- The amd_hsmp driver version can also be displayed using the `-c` flag.
14081421
- The new default for the `version` command is to display all the version information, including both amdgpu and amd_hsmp driver versions.

amdsmi_cli/amdsmi_commands.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ def __init__(self, format='human_readable', destination='stdout', helpers=None)
9696
except amdsmi_exception.AmdSmiLibraryException as e:
9797
if e.err_code in (amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_NOT_INIT,
9898
amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_NO_DRV):
99-
logging.info('Unable to detect any CPU devices, check amd_hsmp version and module status (sudo modprobe amd_hsmp)')
99+
logging.info('Unable to detect any CPU devices, check amd_hsmp (or) hsmp_acpi version and module status (sudo modprobe amd_hsmp (or) sudo modprobe hsmp_acpi)')
100100
else:
101101
raise e
102102

@@ -112,7 +112,7 @@ def __init__(self, format='human_readable', destination='stdout', helpers=None)
112112

113113
if len(self.cpu_handles) == 0 and len(self.core_handles) == 0:
114114
# No CPU's found post amd_hsmp driver initialization
115-
logging.error('Unable to detect any CPU devices, check amd_hsmp version and module status (sudo modprobe amd_hsmp)')
115+
logging.error('Unable to detect any CPU devices, check amd_hsmp (or) hsmp_acpi version and module status (sudo modprobe amd_hsmp (or) sudo modprobe hsmp_acpi)')
116116
exit_flag = True
117117

118118
self.convert_clock_type = {
@@ -200,7 +200,7 @@ def version(self, args, gpu_version=None, cpu_version=None):
200200
if args.gpu_version:
201201
human_readable_output = human_readable_output + f" | amdgpu version: {gpu_version_str}"
202202
if args.cpu_version:
203-
human_readable_output = human_readable_output + f" | amd_hsmp version: {cpu_version_str}"
203+
human_readable_output = human_readable_output + f" | hsmp version: {cpu_version_str}"
204204
# Custom human readable handling for version
205205
if self.logger.destination == 'stdout':
206206
print(human_readable_output)
@@ -2988,7 +2988,7 @@ def metric_cpu(self, args, multiple_devices=False, cpu=None, cpu_power_metrics=N
29882988
try:
29892989
bandwidth = amdsmi_interface.amdsmi_get_cpu_current_io_bandwidth(args.cpu,
29902990
int(args.cpu_io_bandwidth[0][0]),
2991-
args.cpu_io_bandwidth[0][1])
2991+
args.cpu_io_bandwidth[0][1].upper())
29922992
static_dict["io_bandwidth"]["band_width"] = bandwidth
29932993
except amdsmi_exception.AmdSmiLibraryException as e:
29942994
static_dict["io_bandwidth"]["band_width"] = "N/A"
@@ -2998,7 +2998,7 @@ def metric_cpu(self, args, multiple_devices=False, cpu=None, cpu_power_metrics=N
29982998
try:
29992999
bandwidth = amdsmi_interface.amdsmi_get_cpu_current_xgmi_bw(args.cpu,
30003000
int(args.cpu_xgmi_bandwidth[0][0]),
3001-
args.cpu_xgmi_bandwidth[0][1])
3001+
args.cpu_xgmi_bandwidth[0][1].upper())
30023002
static_dict["xgmi_bandwidth"]["band_width"] = bandwidth
30033003
except amdsmi_exception.AmdSmiLibraryException as e:
30043004
static_dict["xgmi_bandwidth"]["band_width"] = "N/A"

amdsmi_cli/amdsmi_helpers.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -239,11 +239,11 @@ def get_cpu_choices(self):
239239
except amdsmi_interface.AmdSmiLibraryException as e:
240240
if e.err_code in (amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_NOT_INIT,
241241
amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_DRIVER_NOT_LOADED):
242-
logging.info('Unable to get device choices, driver not initialized (amd_hsmp not found in modules)')
242+
logging.info('Unable to get device choices, driver not initialized (amd_hsmp or hsmp_acpi not found in modules)')
243243
else:
244244
raise e
245245
if len(cpu_handles) == 0:
246-
logging.info('Unable to find any devices, check if driver is initialized (amd_hsmp not found in modules)')
246+
logging.info('Unable to find any devices, check if driver is initialized (amd_hsmp or hsmp_acpi not found in modules)')
247247
else:
248248
# Handle spacing for the gpu_choices_str
249249
max_padding = int(math.log10(len(cpu_handles))) + 1
@@ -285,11 +285,11 @@ def get_core_choices(self):
285285
except amdsmi_interface.AmdSmiLibraryException as e:
286286
if e.err_code in (amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_NOT_INIT,
287287
amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_DRIVER_NOT_LOADED):
288-
logging.info('Unable to get device choices, driver not initialized (amd_hsmp not found in modules)')
288+
logging.info('Unable to get device choices, driver not initialized (amd_hsmp or hsmp_acpi not found in modules)')
289289
else:
290290
raise e
291291
if len(core_handles) == 0:
292-
logging.info('Unable to find any devices, check if driver is initialized (amd_hsmp not found in modules)')
292+
logging.info('Unable to find any devices, check if driver is initialized (amd_hsmp or hsmp_acpi not found in modules)')
293293
else:
294294
# Handle spacing for the gpu_choices_str
295295
max_padding = int(math.log10(len(core_handles))) + 1

amdsmi_cli/amdsmi_init.py

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -62,18 +62,17 @@ def check_amdgpu_driver():
6262

6363

6464
def check_amd_hsmp_driver():
65-
""" Returns true if amd_hsmp is found in the list of initialized modules """
66-
amd_cpu_status_file = Path("/sys/module/amd_hsmp/initstate")
65+
""" Returns true if amd_hsmp or hsmp_acpi is found in the list of initialized modules """
66+
amd_cpu_status_file = Path("/dev/hsmp")
6767
if amd_cpu_status_file.exists():
68-
if amd_cpu_status_file.read_text(encoding="ascii").strip() == "live":
6968
return True
7069
return False
7170

7271

7372
def amdsmi_cli_init():
7473
""" Initializes AMDSMI Library for the CLI
7574
76-
Checks for the presence of the amdgpu and amd_hsmp drivers and initializes the
75+
Checks for the presence of the amdgpu, amd_hsmp or hsmp_acpi drivers and initializes the
7776
AMD SMI library based on the live drivers found.
7877
7978
Return:
@@ -85,13 +84,13 @@ def amdsmi_cli_init():
8584
init_flag = amdsmi_interface.AmdSmiInitFlags.INIT_ALL_PROCESSORS
8685
if check_amdgpu_driver() and check_amd_hsmp_driver():
8786
init_flag = amdsmi_interface.AmdSmiInitFlags.INIT_AMD_APUS
88-
logging.debug("Both amdgpu and amd_hsmp driver's initstate is live")
87+
logging.debug("Both amdgpu , amd_hsmp or hsmp_acpi driver's initstate is live")
8988
try:
9089
amdsmi_interface.amdsmi_init(init_flag)
9190
except (amdsmi_interface.AmdSmiLibraryException, amdsmi_interface.AmdSmiParameterException) as e:
9291
if e.err_code in (amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_NOT_INIT,
9392
amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_DRIVER_NOT_LOADED):
94-
logging.error("Drivers not loaded (amdgpu and amd_hsmp drivers not found in modules)")
93+
logging.error("Drivers not loaded (amdgpu, amd_hsmp or hsmp_acpi drivers not found in modules)")
9594
sys.exit(-1)
9695
else:
9796
raise e
@@ -107,20 +106,20 @@ def amdsmi_cli_init():
107106
sys.exit(-1)
108107
else:
109108
raise e
110-
logging.debug("amdgpu driver initialized successfully, but amd_hsmp initstate was not live")
109+
logging.debug("amdgpu driver initialized successfully, but amd_hsmp or hsmp_acpi initstate was not live")
111110
elif check_amd_hsmp_driver():
112111
init_flag = amdsmi_interface.AmdSmiInitFlags.INIT_AMD_CPUS
113-
logging.debug("amd_hsmp driver initstate is live")
112+
logging.debug("amd_hsmp or hsmp_acpi driver initstate is live")
114113
try:
115114
amdsmi_interface.amdsmi_init(init_flag)
116115
except (amdsmi_interface.AmdSmiLibraryException, amdsmi_interface.AmdSmiParameterException) as e:
117116
if e.err_code in (amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_NOT_INIT,
118117
amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_DRIVER_NOT_LOADED):
119-
logging.error("Driver not loaded (amd_hsmp not found in modules)")
118+
logging.error("Driver not loaded (amd_hsmp or hsmp_acpi not found in modules)")
120119
sys.exit(-1)
121120
else:
122121
raise e
123-
logging.debug("amd_hsmp driver initialized successfully, but amdgpu initstate was not live")
122+
logging.debug("amd_hsmp or hsmp_acpi driver initialized successfully, but amdgpu initstate was not live")
124123

125124
logging.debug(f"AMDSMI initialized with atleast one driver successfully | init flag: {init_flag}")
126125

amdsmi_cli/amdsmi_parser.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -754,7 +754,7 @@ def _add_version_parser(self, subparsers: argparse._SubParsersAction, func):
754754

755755
# help info
756756
gpu_version_help = "Display the current amdgpu driver version"
757-
cpu_version_help = "Display the current amd_hsmp driver version"
757+
cpu_version_help = "Display the current amd_hsmp or hsmp_acpi driver version"
758758

759759
# Add GPU and CPU version Arguments
760760
version_parser.add_argument('-g', '--gpu_version', action='store_true', required=False, help=gpu_version_help, default=None)
@@ -1252,8 +1252,7 @@ def _add_set_value_parser(self, subparsers: argparse._SubParsersAction, func):
12521252
set_perf_level_help = f"Set one of the following performance levels:\n\t{perf_level_help_choices_str}"
12531253
power_profile_choices_str = ", ".join(self.helpers.get_power_profiles()[0:-1])
12541254
set_profile_help = f"Set power profile level (#) or choose one of available profiles:\n\t{power_profile_choices_str}"
1255-
perf_det_choices_str = ", ".join(self.helpers.get_perf_det_levels())
1256-
set_perf_det_help = f"Set performance determinism and select one of the corresponding performance levels:\n\t{perf_det_choices_str}"
1255+
set_perf_det_help = "Enable performance determinism mode and set GFXCLK softmax limit (in MHz)"
12571256
(accelerator_set_choices, _) = self.helpers.get_accelerator_choices_types_indices()
12581257
memory_partition_choices_str = ", ".join(self.helpers.get_memory_partition_types())
12591258
accelerator_set_choices_str = ", ".join(accelerator_set_choices)

docs/how-to/amdsmi-cli-tool.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -563,8 +563,7 @@ Set Arguments:
563563
AUTO, LOW, HIGH, MANUAL, STABLE_STD, STABLE_PEAK, STABLE_MIN_MCLK, STABLE_MIN_SCLK, DETERMINISM
564564
-P, --profile PROFILE_LEVEL Set power profile level (#) or choose one of available profiles:
565565
CUSTOM_MASK, VIDEO_MASK, POWER_SAVING_MASK, COMPUTE_MASK, VR_MASK, THREE_D_FULL_SCR_MASK, BOOTUP_DEFAULT
566-
-d, --perf-determinism SCLKMAX Set performance determinism and select one of the corresponding performance levels:
567-
AUTO, LOW, HIGH, MANUAL, STABLE_STD, STABLE_PEAK, STABLE_MIN_MCLK, STABLE_MIN_SCLK, DETERMINISM
566+
-d, --perf-determinism SCLKMAX Enable performance determinism mode and set GFXCLK softmax limit (in MHz)
568567
-C, --compute-partition TYPE/INDEX Set one of the following the accelerator TYPE or profile INDEX:
569568
N/A.
570569
Use `sudo amd-smi partition --accelerator` to find acceptable values.

docs/install/install.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,7 @@ AMD SMI library can run on AMD ROCm supported platforms. Refer to
3737
for more information.
3838
<!--https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html-->
3939

40-
To run the AMD SMI library, the `amdgpu` driver and the `amd_hsmp` driver need
41-
to be installed. Optionally, `libdrm` can be installed to query firmware
40+
To run the AMD SMI library, the `amdgpu` driver and the `amd_hsmp` or `hsmp_acpi` driver need to be installed. Optionally, `libdrm` can be installed to query firmware
4241
information and hardware IPs.
4342

4443
### Python interface and CLI tool prerequisites

docs/reference/amdsmi-py-api.md

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1310,12 +1310,8 @@ Input parameters:
13101310
* `cursor` the zero based index at which to start retrieving cper entries; default value is 0; for example, if there are 10 cper entries available, then with a cursor value of 8, it will retrieve the last two cper entries only
13111311

13121312
Output: Dictionary with fields, updated cursor, and a dictionary of the cper_data, status_code
1313-
status_code:
1314-
AMDSMI_STATUS_SUCCESS: If all entries were retrieved successfully
1315-
AMDSMI_STATUS_MORE_DATA: If some of the entries were retrieved and:
1316-
* A subsequent call to the API with the updated cursor will result in the fetching the next batch of entries, or
1317-
* Increasing the input buffer_size will allow more entries to be fetched with the same cursor
13181313

1314+
Output1: Dictionary with fields
13191315
Field | Description
13201316
---|---
13211317
`error_severity` | The severity of the CPER error ex: `non_fatal_uncorrected`, `fatal`, `non_fatal_corrected`. |
@@ -1326,12 +1322,25 @@ Field | Description
13261322
`signature_end` | A marker value (typically `0xFFFFFFFF`) confirming the integrity of the signature. |
13271323
`sec_cnt` | The count of sections included in the CPER entry. |
13281324
`record_length` | The total length in bytes of the CPER entry. |
1325+
`serial_number` | The product serial number. Exists in raw entries in C++ API |
13291326
`platform_id` | A character array identifying the GPU or platform. |
13301327
`creator_id` | A character array indicating the creator of the CPER entry. |
13311328
`record_id` | A unique identifier for the CPER entry. |
13321329
`flags` | Reserved flags related to the CPER entry. |
13331330
`persistence_info` | Reserved information related to persistence. |
13341331

1332+
Output2: Updated cursor (int type)
1333+
* Cursor is the index of the next cper entry in the GPU ring buffer. For example, if 10 entries were fetched successfully, the value of cursor will be 11 upon return from the API. Subsequent call to the API with cursor value of 11 should fetch the next entry
1334+
1335+
Output3: A list of dictionaries, each dictionary containing the CPER record and its size:
1336+
* {"bytes": <raw bytes>, "size": <number of bytes>}
1337+
1338+
Output4: status_code
1339+
AMDSMI_STATUS_SUCCESS: If all entries were retrieved successfully
1340+
AMDSMI_STATUS_MORE_DATA: If some of the entries were retrieved and:
1341+
* A subsequent call to the API with the updated cursor will result in the fetching the next batch of entries, or
1342+
* Increasing the input buffer_size will allow more entries to be fetched with the same cursor
1343+
13351344
Exceptions that can be thrown by `amdsmi_get_gpu_cper_entries` function:
13361345

13371346
* `AmdSmiLibraryException`

goamdsmi_shim/smiwrapper/amdsmi_go_shim.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@
4545
#define AMDGPU_INITSTATE_FILE "/sys/module/amdgpu/initstate"
4646

4747
#define AMDHSMP_DRIVER_NAME "AMDHSMPDriver"
48-
#define AMDHSMP_INITSTATE_FILE "/sys/module/amd_hsmp/initstate"
48+
#define AMDHSMP_INITSTATE_FILE "/dev/hsmp"
4949

5050
static uint32_t num_apuSockets = GOAMDSMI_VALUE_0;
5151
static uint32_t num_cpuSockets = GOAMDSMI_VALUE_0;

include/amd_smi/impl/amd_smi_cper.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,5 +220,5 @@ struct cper_1_0 {
220220

221221
amdsmi_status_t amdsmi_get_gpu_cper_entries_by_path(const char *amdgpu_ring_cper_file, uint32_t severity_mask,
222222
char *cper_data, uint64_t *buf_size, amdsmi_cper_hdr_t **cper_hdrs,
223-
uint64_t *entry_count, uint64_t *cursor);
223+
uint64_t *entry_count, uint64_t *cursor, uint64_t product_serial);
224224
std::vector<int> cper_decode(const amdsmi_cper_hdr_t *cper);

0 commit comments

Comments
 (0)