Question about the cuda graph acceleration #558
-
Hi, I notice a speed slowdown if I lanuch capture a graph multiple times. The first time, it's very quickly, however, later it will become slow. After some time, it bcomes quicly again and slow again.... Is there some potential reason for this? I guess it may be related to the cuda memory overwriting strategy? Is there some way in warp to manually release the previous launched graph? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi Jianghanxiao, thank you for the question. First, can we see the code that you use for profiling? Achieving accurate profiling for GPU programs are quite non-trivial and what you are experiencing sounds like an issue of the profiler rather than the graph launch itself. For example, once you launch a CUDA program, it won't be executed immediately. Sometimes, the scheduler will wait until there are enough kernel launches or the wait-time elapses. Sometimes, the profiler itself may add an unignorable overhead to the CUDA launch. Sometimes you need to do some warmups to boost up the GPU's clock. It is very had to analyze what is happening unless we see your code. |
Beta Was this translation helpful? Give feedback.
Sounds like you’re running into the same issue as #277 with the wp.ScopedTimer not measuring the same time you were expecting it to measure. See the profiling docs for how to measure GPU performance more accurately, e.g. using CUDA events.