-
Notifications
You must be signed in to change notification settings - Fork 419
Add resource requests for Windows container in Antrea deployment #7254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test-windows-all |
efd4ac5
to
a051cbb
Compare
@@ -62,6 +62,10 @@ spec: | |||
- disable | |||
{{- end}} | |||
name: antrea-agent | |||
resources: | |||
requests: | |||
cpu: 100m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have 200m for the Linux container
any reason why we are using a lower value for the Windows one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to update the PR description. Based on our tests, I've never observed such high average CPU usage when workload pods are running stably (the cpu cost is indeed higher when workload pods are starting). May I know the Linux test scenario? Is the Linux resource request based on average or burst usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have had the CPU request for Linux for a long time. I don't remember how it was measured.
However, unless you had a testbed with some ongoing activity, I think it would be better to be conservative and match the Linux value. FQDN policies, network policy audit logging, ... are things that can consume quite a bit of CPU in a steady way, and you probably didn't test with these feature actively used. On Linux we also have the FlowExporter which could be quite CPU intensive.
Of course, another approach is to say that this request value should really be a baseline, and that users should increase it based on their use case. That being said, we haven't had any complaints about setting it to 200m for Linux, AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have had the CPU request for Linux for a long time. I don't remember how it was measured. However, unless you had a testbed with some ongoing activity, I think it would be better to be conservative and match the Linux value. FQDN policies, network policy audit logging, ... are things that can consume quite a bit of CPU in a steady way, and you probably didn't test with these feature actively used. On Linux we also have the FlowExporter which could be quite CPU intensive. Of course, another approach is to say that this request value should really be a baseline, and that users should increase it based on their use case. That being said, we haven't had any complaints about setting it to 200m for Linux, AFAIK.
Thanks for the explanation. I am ok to set the CPU request value the same as the Linux pod.
resources: | ||
requests: | ||
cpu: 100m | ||
memory: 100Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the memory requests based on some observations for a Windows Node running Antrea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my test, I deployed 100 pods on a 4c8g Windows node. During the test, the average agent memory cost was less than 50MB when the workload pods were running stably. At first I used the burst memory cost as the resource request. However, after @wenyingd 's review, I agree that such a large value might prevent users from scheduling necessary workloads onto the node. Therefore, it's acceptable for the burst cost to be larger than the resource request. Unlike the resource limit setting, it won't prevent the container from acquiring more resources once it's scheduled and running on the node. Could you share more insights?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok.
Usually, memory should not really vary that much. Once memory is given to a process, it's not always returned to the OS right away, even if the process no longer needs it. So I am a bit surprised that you see big bursts in memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the highest burst memory cost was only captured once by the test script. I've never observed such a cost during long-term pod operation, so I'll keep the current memory request value.
9e07d65
to
eb9baac
Compare
eb9baac
to
35e4119
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@XinShuYang looks like there is still an issue with manifest generation |
Based on performance testing on a 4C8G Windows node running 100 pods, the antrea-agent container showed an average CPU usage of 30m and memory usage of 50MB, while the OVS container consumed 17m CPU and 23MB memory. To account for potential burst scenarios and ensure runtime stability, both containers have been configured with resource requests of 100m CPU and 100MB memory. Signed-off-by: Shuyang Xin <[email protected]>
35e4119
to
6f165cb
Compare
/test-windows-all |
…) (#7313) Based on performance testing on a 4C8G Windows node running 100 pods, the antrea-agent container showed an average CPU usage of 30m and memory usage of 50MB, while the OVS container consumed 17m CPU and 23MB memory. To account for potential burst scenarios and ensure runtime stability, memory requests are set to 100MB and CPU requests are set to 200m (except for the install-cni initContainer). The CPU requests match the ones for the Agent on Linux. Signed-off-by: Shuyang Xin <[email protected]>
Based on performance testing on a 4C8G Windows node running 100 pods, the antrea-agent container showed an average CPU usage of 30m and memory usage of 50MB, while the OVS container consumed 17m CPU and 23MB memory.
To account for potential burst scenarios and ensure runtime stability, both containers have been configured with resource requests of 100m CPU and 100MB memory.