Skip to content

Conversation

@cgbowman
Copy link

@cgbowman cgbowman commented Oct 3, 2025

During forced composition, work sent to the compute-only queue is starved for resources, competing with the general queue. When resources are freed up for the compute-only queue, the general queue is waiting at a sync point for the compute work to finish.

This can be beneficial for some workloads, but it causes applications running with gamescope's forced composition to slow down when the queues are competing for resources.

By forcing the compositing work to be done in the general queue, we avoid the wait for sync points and the performance loss is not observed.

@msatwood
Copy link

msatwood commented Oct 3, 2025

@cgbowman I think it would be best to link the data here.

@cgbowman
Copy link
Author

cgbowman commented Oct 3, 2025

Here is a snapshot of gputop during normal usage of gamescope (no forced composition): mtl-er-gputop-composite-disabled

Here, we'll see that rcs usage for Elden Ring is at ~94%.

Here is a snapshot of gputop while gamescope is running with forced composition: mtl-er-gputop-composite-enabled-compute-only-queue

With the way gamescope currently works for Intel devices, it chooses the compute-only queue, loading up ccs work from ~9-15%, with the rcs work dropping to ~78%.

Here is a snapshot of gputop while gamescope is running with forced composition, along with the change to use the general queue: mtl-er-gputop-composite-enabled-general-queue

With the change, we move the composition work to the general queue, which is shown to now be active on rcs (with the same percentages as the composition work had on ccs).

Here is video of Elden Ring on MTL, using Xe driver, showing what happens before applying the patch:
mtl-er-compute-only-queue.mp4
Here is the same setup, but we choose the general queue over the compute-only queue:
mtl-er-general-queue.mp4

@misyltoad
Copy link
Collaborator

misyltoad commented Oct 3, 2025

This sounds like the compute only queue is just broken and should be disabled in Mesa or fixed somehow.

@cgbowman cgbowman force-pushed the intel-force-general-queue branch from 1f8c6fe to 896318d Compare October 4, 2025 00:06
@cgbowman
Copy link
Author

cgbowman commented Oct 4, 2025

This sounds like the compute only queue is just broken and should be disabled in Mesa or fixed somehow.

I think the syncing mechanisms could definitely use a lot more improvement, but it seems that some applications fare better with lower usage of the compute-only queue (for example, I still see Elden Ring making use of it at times, but ccs usage never goes above 4%). I'm unsure what the effects would be on other applications if we disabled the compute-only queue completely (possibly hangs?). Without broader testing, I think such a move could have negative impacts.

I'll be sure to bring it up to our teams as something to fix, but for the time being, I think this is the best move we can currently make to keep performance from degrading for gamescope specifically and not regress other applications.

Once we have a fix for syncing on the compute-only queue (or a better usage model), I think it would be good to remove this.

@misyltoad
Copy link
Collaborator

Outside of Gamescope, we also use async compute in a very similar way in Steam Link VR.

It might be worth looking at Doom Eternal which has present from compute also.

I would imagine these apps would be broken in a similar way to how every app under Gamescope + composition is.

@cgbowman
Copy link
Author

cgbowman commented Oct 4, 2025

Outside of Gamescope, we also use async compute in a very similar way in Steam Link VR.

It might be worth looking at Doom Eternal which has present from compute also.

I would imagine these apps would be broken in a similar way to how every app under Gamescope + composition is.

If Doom Eternal shows similar symptoms, that could likely bump the severity of the issue. I'll do some profiling on it next week and see if it's being similarly affected. Thanks for the heads up!

@matte-schwartz
Copy link

matte-schwartz commented Oct 5, 2025

I have Doom Eternal on my Lunar Lake handheld. There's no meaningful difference between present from compute on/off in terms of pure FPS. If anything, it actually performs worse with present from compute disabled if I watch the frame time graph.

Steam Link VR would be an interesting test but I don't have any compatible headsets.

@cgbowman
Copy link
Author

Checking out Doom Eternal on my device yielded similar results as @matte-schwartz, showing CCS usage around 15% and RCS usage around 90%, but the app runs well. We'll need to explore more into the sync mechanisms and why Gamescope's composition is affected more by using CCS than standalone apps like Doom Eternal.

Would this PR suffice as a workaround in the meantime? If it helps, I could add a TODO comment prompting removal once a fix is found.

@misyltoad
Copy link
Collaborator

Ya, please add a comment explaining why and a log.

@cgbowman cgbowman force-pushed the intel-force-general-queue branch from 896318d to ce21f95 Compare October 20, 2025 22:19
@matte-schwartz
Copy link

so the workaround definitely helps with frametiming while compositing in Elden Ring, but some of the numbers on performance with scanout vs composition on Xe are still very rough compared to Steam Deck, especially when compositing a cursor in a high FPS situation.

Below are some numbers from Manor Lords using the same settings on both devices:

manor-lords-settings-20251021133150
Device: Scanout Composition Cursor composition
MSI Claw 8 AI+ A2VM (Lunar Lake) manor-lords-lnl-scanout-20251021132630 115 FPS manor-lords-lnl-composite-20251021132652 99 FPS manor-lords-lnl-cursor-composite-20251021132706 83 FPS
Steam Deck OLED (AMD Vangogh) manor-lords-deck-scanout-20251021132145 80 FPS manor-lords-deck-composite-20251021132247 77 FPS manor-lords-deck-cursor-composite-20251021132408 76 FPS

I can also put this in the drm/xe issue if that would help to keep track of things. I still think this workaround is worth landing for now, but ideally there's some more work to be done in bridging this gap a bit more imo.

@msatwood
Copy link

@matte-schwartz is this with or without @cgbowman workaround?

@matte-schwartz
Copy link

matte-schwartz commented Oct 21, 2025

@msatwood this is with the workaround, sorry for not clarifying. I can grab the same data without the workaround in a little bit.

Oct 21 13:10:28 msi-claw8-a2vm gamescope-session[23768]: [gamescope] [Info]  vulkan: Intel device detected, forcing general queue family instead of compute-only queue

added a vk_log to confirm it was being used

@msatwood
Copy link

No apologies necessary. Given that this is with the workaround, I suspect this is an entirely different issue.

@matte-schwartz
Copy link

that makes sense, I can file a follow-up issue for it in that case. think mesa or drm/xe would be a better tracker?

@msatwood
Copy link

I am going to say Mesa

@cgbowman
Copy link
Author

There are multiple locations where this could be occurring, but yeah, Mesa sounds like the correct place to file this. You can assign to me and I'll follow up with it in there.

@cgbowman
Copy link
Author

@matte-schwartz, could you include instructions on how to test the cursor composition? I'll include that in my own testing.

@matte-schwartz
Copy link

@cgbowman I put instructions in here: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14159#any-extra-information-would-be-greatly-appreciated, or i can write instructions for how to install an Intel-compatible version of SteamOS in that issue tomorrow.

During forced composition, work sent to the compute-only queue is
starved for resources, competing with the general queue. When resources
are freed up for the compute-only queue, the general queue is waiting at
a sync point for the compute work to finish.

This can be beneficial for some workloads, but it causes applications
running with gamescope's forced composition to slow down when the
queues are competing for resources.

By forcing the compositing work to be done in the general queue, we
avoid the wait for sync points and the performance loss is not observed.

This is a workaround to be removed once resolved.
See: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4452

Signed-off-by: Casey Bowman <[email protected]>
@cgbowman cgbowman force-pushed the intel-force-general-queue branch from ce21f95 to 7da2449 Compare October 22, 2025 04:46
@cgbowman
Copy link
Author

Oct 21 13:10:28 msi-claw8-a2vm gamescope-session[23768]: [gamescope] [Info]  vulkan: Intel device detected, forcing general queue family instead of compute-only queue

added a vk_log to confirm it was being used

My apologies for not adding this. Added the log & the regressed Intel ID (I had made the changes on another machine and didn't see the vendor ID had not updated).

@cgbowman
Copy link
Author

@cgbowman I put instructions in here: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14159#any-extra-information-would-be-greatly-appreciated, or i can write instructions for how to install an Intel-compatible version of SteamOS in that issue tomorrow.

I think this is sufficient information to get started, I'll let you know if we'll need further instructions after trying to replicate the issue in our environment. I greatly appreciate the detail! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants