-
Notifications
You must be signed in to change notification settings - Fork 318
Open
Labels
Description
Required Info:
- AWS ParallelCluster version 3.14.0
- Full cluster configuration without any credentials or personal data.
- Cluster name:
{
"creationTime": "2026-01-09T19:34:00.291Z",
"headNode": {
"launchTime": "2026-01-13T22:31:58.000Z",
"instanceId": "i-08bed33afb8304b17",
"publicIpAddress": "removed",
"instanceType": "t3.medium",
"state": "running",
"privateIpAddress": "172.31.92.166"
},
"version": "3.14.0",
"clusterConfiguration": {
"url": "https://parallelcluster-a1212dd4272ca60a-v1-do-not-delete.s3.amazonaws.com/parallelcluster/3.14.0/clusters/rocky-pcluster-if0aqqa6lvin39j1/configs/cluster-config.yaml?versionId=Ujqxyjl37nQB4EX5pVifLbEGGm09MXHv&AWSAccessKeyId=ASIA3A2KWP626QOBJDYB&Signature=Znr6I8jxOjv%2FCYWrgR56xuoc1Jc%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEEgaCXVzLWVhc3QtMiJHMEUCIB9L%2FBfKkd6sBlymKGVztll5vhuyINpWuiszsq8cSbZ%2BAiEA11xBsMDDH11WZ69BJzFmab0wsHRVXPvoD%2FWqhSEVCugqiAMIERABGgw3NTc2ODE1MjA1NjUiDDeK6WKq7OzJvKyTVSrlAjR6kAgPtbeVUe6K4AORiGqDwLH0W%2FtuOd5XPs0n37u4lp%2FraFoRDqx72maHRjVW9chWWXWkZrLtKVA2urRishs2kGdVitGXLLc0cZIlqmJHEjwy3pkIh%2BQhTFZY1APTVFPNg4PS7tlhX6mgzBRwBAKa%2FONjeFtxKxdbKrho%2FDMOZHpkdI5ehCMxnY7DC0rtbiwmrnc2oo9AENcuATnCovw%2FZsiuD%2FDpSpzgZxXmPRFh4YnZ8RlejwQ2Z947%2FyLwLXXe4UivIsjYVk8NQVL3qVqUue%2BSclToglb07CDSJcXysF5Mw2eteL49j2QGeuAOiO92D3kvF1NYKylZeJtUt%2B4zdwIicJ4neAenOR%2FMd71aJ02NsnnG20rcA9V3tej2y6YGDdcno8TvQgbBY6Kb8SJvQ1BXzuiwgmeAv17mVziUfdhsMFB5%2F9Js8a%2Fu1B7WGPxWENMtr7lxvKSSHkIdXUa4xzDigjD0v5vLBjqkAWUBhzLT7nDnyc0k0WtTjCaOs3g63NcLWtKIosV%2By5%2BkcMpDajZ4XQXAi7QFez3HYrd935EF4C5PbP%2F4t%2FQE%2FMC1ID1wrDv%2BpF482ZNWEbBFwKVw2cDYeQ4h7OhYqQCAxpFZhXEWk09brSeCy2IcZdfqMCEcM%2FJ0ANe6WAFwBmZXY%2Bgeod5I7bzFW2AlhpmaNug9xwaXLQz3lLqKgXiOrBwM7oTQ&Expires=1768354638"
},
"tags": [
{
"value": "3.14.0",
"key": "parallelcluster:version"
},
{
"value": "rocky-pcluster",
"key": "parallelcluster:cluster-name"
}
],
"cloudFormationStackStatus": "UPDATE_COMPLETE",
"clusterName": "rocky-pcluster",
"computeFleetStatus": "RUNNING",
"cloudformationStackArn": "arn:aws:cloudformation:us-east-1::stack/rocky-pcluster/25aa20c0-ed92-11f0-80dc-0affcf2f8753",
"lastUpdatedTime": "2026-01-14T00:16:35.277Z",
"region": "us-east-1",
"clusterStatus": "UPDATE_COMPLETE",
"scheduler": {
"type": "slurm"
}
}
Bug description and how to reproduce:
Launch instance through srun call, resulting instance terminates early if its a gpu node with instance store. regular compute nodes launch ok. Instances are small during prototyping and testing.
CloudWatch event
{
"datetime": "2026-01-14T00:24:23+00:00",
"version": 0,
"cluster-name": "rocky-pcluster",
"scheduler": "slurm",
"node-role": "ComputeFleet",
"level": "ERROR",
"instance-id": "i-0ecb993d5fbf812e5",
"event-type": "chef-recipe-exception",
"message": "Chef recipe exception",
"component": "config",
"compute": {
"name": null,
"instance-id": "i-0ecb993d5fbf812e5",
"instance-type": "g5.xlarge",
"availability-zone": "us-east-1a",
"address": "172.31.84.112",
"hostname": "ip-172-31-84-112.ec2.internal",
"queue-name": "gpu",
"compute-resource": "hpc-gpu-001",
"node-type": null
},
"detail": {
"failures": [
{
"exception-type": "Mixlib::ShellOut::ShellCommandFailed",
"error-title": "Error executing action `run` on resource 'execute[Setup of ephemeral drives]'",
"nesting-level": 0,
"cookbook-name": "aws-parallelcluster-environment",
"recipe-name": "ephemeral_drives",
"source-line": "/etc/chef/local-mode-cache/cache/cookbooks/aws-parallelcluster-environment/recipes/config/ephemeral_drives.rb:28:in `from_file'",
"resource-name": "Setup of ephemeral drives",
"resource-type": "execute",
"action": "run"
}
]
}
}