This repository was archived by the owner on Sep 3, 2025. It is now read-only.
generated from knative-extensions/sample-controller
-
Notifications
You must be signed in to change notification settings - Fork 8
This repository was archived by the owner on Sep 3, 2025. It is now read-only.
With the resourceUtil strategy, traffic transfer does not occur during an update when deployment hits quota limit #203
Copy link
Copy link
Closed as not planned
Closed as not planned
Copy link
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Description
In a K8s cluster with 1 GPU
Initially, we apply isvc with replicas as 1
NAME READY STATUS RESTARTS AGE
deploy1-predictor-00001-deployment-868df87c79-6k4sx 2/2 Running 0 79s
Then we update isvc replicas to 2.
With the one update, we notice 2 revisions.
Initial State - Revision 1
After Update: Revision 2
Transitions noticed:
- All pods of Revision 1 terminated instantly, but traffic remained to be directed to Revision 1.
- One pod of revision 2 starts to run as only one GPU is available while another pod is pending. The traffic is still directed to revision 1.
NAME READY STATUS RESTARTS AGE
deploy1-predictor-00002-deployment-5b7d9c4f7-4fwz8 1/2 Running 0 21s
deploy1-predictor-00002-deployment-5b7d9c4f7-w9slk 0/2 Pending 0 21s
Traffic:
Latest Revision: false
Percent: 0
Revision Name: deploy1-predictor-00001
Latest Revision: true
Percent: 100
Revision Name: deploy1-predictor-00001
- If in this state (before all containers in one pod of revision 2 are running), an inference request is sent. This triggers Knative to spawn a pod in Revision 1 due to the route.
NAME READY STATUS RESTARTS AGE
deploy1-predictor-00001-deployment-f4cd9f5c4-g6g7h 0/2 Pending 0 3m21s
deploy1-predictor-00002-deployment-587c5d876c-w9slk 0/2 Pending 0 4m5s
deploy1-predictor-00002-deployment-587c5d876c-4fwz8 2/2 Running 4 (2m24s ago) 4m6s
Traffic:
Latest Revision: false
Percent: 0
Revision Name: deploy1-predictor-00001
Latest Revision: true
Percent: 100
Revision Name: deploy1-predictor-00001
- Since the GPU is used by revision 2 pod revision 1 is stuck in a pending state.
- At this stage, even though the GPU is being used and one revision 2 pod is in the running state, the inference requests fail, and this stage does not correct itself.
This behavior happens during an update for an isvc that has replicas as 1 and is deployed in a cluster with no extra resources.
Is this the expected behavior for the scenario mentioned above?
Metadata
Metadata
Assignees
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.