-
Notifications
You must be signed in to change notification settings - Fork 665
Description
Hello
I'm having some issues with AWS GHA runners, the EC2 instances are always marked as orphan when the scale-down script is executed, and when scale-down is executed again they are terminated.
I'm using version 6.0.1 and deploying everything using the terraform module.
Here is my test.
I execute a task that waits for 2 hours, sending a message to console every minute.
The action is executed @21:33
This is the log of scale up
2025-01-24 21:34:34 {"level":"INFO","message":"Created instance(s): i-0425d988f2c7fa6f4","sampling_rate":0,"service":"runners-scale-up","timestamp":"2025-01-25T00:34:29.744Z","xray_trace_id":"1-67943191-450de17da578343815965378","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"runners","aws-request-id":"ec284d3c-9be4-5940-b336-2d32a8272852","function-name":"gha-ondemand-multi-linux-x64-dem-scale-up","runner":{"type":"Repo","owner":"fabfitfun/tv-api","namePrefix":"gha-ondemand-normal-","configuration":{"runnerType":"Repo","runnerOwner":"fabfitfun/tv-api","numberOfRunners":1,"ec2instanceCriteria":{"instanceTypes":["t3a.2xlarge"],"targetCapacityType":"on-demand","instanceAllocationStrategy":"lowest-price"},"environment":"gha-ondemand-multi-linux-x64-dem","launchTemplateName":"gha-ondemand-multi-linux-x64-dem-action-runner","subnets":["subnet-048333888bdbd82e0","subnet-053fb4a528b2cf900","subnet-0f069d8ae24648af2"],"tracingEnabled":false,"onDemandFailoverOnError":[]}},"github":{"event":"workflow_job","workflow_job_id":"36153145706"}}
Then this is the log of scale-down (tagging execution)
2025-01-24 21:45:55 {"level":"DEBUG","message":"Found: '0' GitHub runners for AWS runner instance: 'i-0425d988f2c7fa6f4'","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T00:45:53.100Z","xray_trace_id":"1-6794343b-6e0365590f6c94c11e72aff7","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"scale-down","aws-request-id":"523a23f1-54ab-407f-af3f-26d9e01619a8","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"} 2025-01-24 21:45:55 {"level":"DEBUG","message":"GitHub runners for AWS runner instance: 'i-0425d988f2c7fa6f4': []","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T00:45:53.100Z","xray_trace_id":"1-6794343b-6e0365590f6c94c11e72aff7","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"scale-down","aws-request-id":"523a23f1-54ab-407f-af3f-26d9e01619a8","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"} 2025-01-24 21:45:55 {"level":"DEBUG","message":"Tagging 'i-0425d988f2c7fa6f4'","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T00:45:53.100Z","xray_trace_id":"1-6794343b-6e0365590f6c94c11e72aff7","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"runners","aws-request-id":"523a23f1-54ab-407f-af3f-26d9e01619a8","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down","tags":[{"Key":"ghr:orphan","Value":"true"}]} 2025-01-24 21:45:55 {"level":"INFO","message":"Runner 'i-0425d988f2c7fa6f4' marked as orphan.","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T00:45:53.388Z","xray_trace_id":"1-6794343b-6e0365590f6c94c11e72aff7","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"scale-down","aws-request-id":"523a23f1-54ab-407f-af3f-26d9e01619a8","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"}
And this the second pass of scale-down
2025-01-24 22:00:53 {"level":"INFO","message":"Terminating orphan runner 'i-0425d988f2c7fa6f4'","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T01:00:50.394Z","xray_trace_id":"1-679437bf-375a9fde597887a5102c1220","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"scale-down","aws-request-id":"73d294c7-c575-4bde-812b-68a9f2c43ef0","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"} 2025-01-24 22:00:53 {"level":"DEBUG","message":"Runner 'i-0425d988f2c7fa6f4' will be terminated.","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T01:00:50.394Z","xray_trace_id":"1-679437bf-375a9fde597887a5102c1220","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"runners","aws-request-id":"73d294c7-c575-4bde-812b-68a9f2c43ef0","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"} 2025-01-24 22:00:53 {"level":"DEBUG","message":"Runner i-0425d988f2c7fa6f4 has been terminated.","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T01:00:50.723Z","xray_trace_id":"1-679437bf-375a9fde597887a5102c1220","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"runners","aws-request-id":"73d294c7-c575-4bde-812b-68a9f2c43ef0","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"}
What might be wrong?, why is the scale-down script not detecting that the instance is still active?
Any suggestion or comment would be much appreciated.
Fernando.