Skip to content

EC2 instance runners marked as orphan and deleted even when the job is running. #4391

@ferlosvillas

Description

@ferlosvillas

Hello
I'm having some issues with AWS GHA runners, the EC2 instances are always marked as orphan when the scale-down script is executed, and when scale-down is executed again they are terminated.
I'm using version 6.0.1 and deploying everything using the terraform module.

Here is my test.
I execute a task that waits for 2 hours, sending a message to console every minute.
The action is executed @21:33
This is the log of scale up
2025-01-24 21:34:34 {"level":"INFO","message":"Created instance(s): i-0425d988f2c7fa6f4","sampling_rate":0,"service":"runners-scale-up","timestamp":"2025-01-25T00:34:29.744Z","xray_trace_id":"1-67943191-450de17da578343815965378","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"runners","aws-request-id":"ec284d3c-9be4-5940-b336-2d32a8272852","function-name":"gha-ondemand-multi-linux-x64-dem-scale-up","runner":{"type":"Repo","owner":"fabfitfun/tv-api","namePrefix":"gha-ondemand-normal-","configuration":{"runnerType":"Repo","runnerOwner":"fabfitfun/tv-api","numberOfRunners":1,"ec2instanceCriteria":{"instanceTypes":["t3a.2xlarge"],"targetCapacityType":"on-demand","instanceAllocationStrategy":"lowest-price"},"environment":"gha-ondemand-multi-linux-x64-dem","launchTemplateName":"gha-ondemand-multi-linux-x64-dem-action-runner","subnets":["subnet-048333888bdbd82e0","subnet-053fb4a528b2cf900","subnet-0f069d8ae24648af2"],"tracingEnabled":false,"onDemandFailoverOnError":[]}},"github":{"event":"workflow_job","workflow_job_id":"36153145706"}}

Then this is the log of scale-down (tagging execution)
2025-01-24 21:45:55 {"level":"DEBUG","message":"Found: '0' GitHub runners for AWS runner instance: 'i-0425d988f2c7fa6f4'","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T00:45:53.100Z","xray_trace_id":"1-6794343b-6e0365590f6c94c11e72aff7","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"scale-down","aws-request-id":"523a23f1-54ab-407f-af3f-26d9e01619a8","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"} 2025-01-24 21:45:55 {"level":"DEBUG","message":"GitHub runners for AWS runner instance: 'i-0425d988f2c7fa6f4': []","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T00:45:53.100Z","xray_trace_id":"1-6794343b-6e0365590f6c94c11e72aff7","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"scale-down","aws-request-id":"523a23f1-54ab-407f-af3f-26d9e01619a8","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"} 2025-01-24 21:45:55 {"level":"DEBUG","message":"Tagging 'i-0425d988f2c7fa6f4'","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T00:45:53.100Z","xray_trace_id":"1-6794343b-6e0365590f6c94c11e72aff7","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"runners","aws-request-id":"523a23f1-54ab-407f-af3f-26d9e01619a8","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down","tags":[{"Key":"ghr:orphan","Value":"true"}]} 2025-01-24 21:45:55 {"level":"INFO","message":"Runner 'i-0425d988f2c7fa6f4' marked as orphan.","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T00:45:53.388Z","xray_trace_id":"1-6794343b-6e0365590f6c94c11e72aff7","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"scale-down","aws-request-id":"523a23f1-54ab-407f-af3f-26d9e01619a8","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"}

And this the second pass of scale-down
2025-01-24 22:00:53 {"level":"INFO","message":"Terminating orphan runner 'i-0425d988f2c7fa6f4'","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T01:00:50.394Z","xray_trace_id":"1-679437bf-375a9fde597887a5102c1220","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"scale-down","aws-request-id":"73d294c7-c575-4bde-812b-68a9f2c43ef0","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"} 2025-01-24 22:00:53 {"level":"DEBUG","message":"Runner 'i-0425d988f2c7fa6f4' will be terminated.","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T01:00:50.394Z","xray_trace_id":"1-679437bf-375a9fde597887a5102c1220","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"runners","aws-request-id":"73d294c7-c575-4bde-812b-68a9f2c43ef0","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"} 2025-01-24 22:00:53 {"level":"DEBUG","message":"Runner i-0425d988f2c7fa6f4 has been terminated.","sampling_rate":0,"service":"runners-scale-down","timestamp":"2025-01-25T01:00:50.723Z","xray_trace_id":"1-679437bf-375a9fde597887a5102c1220","region":"us-west-2","environment":"gha-ondemand-multi-linux-x64-dem","module":"runners","aws-request-id":"73d294c7-c575-4bde-812b-68a9f2c43ef0","function-name":"gha-ondemand-multi-linux-x64-dem-scale-down"}

What might be wrong?, why is the scale-down script not detecting that the instance is still active?
Any suggestion or comment would be much appreciated.

Fernando.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions