Skip to content

Commit 15c44a6

Browse files
Balasubramanian, Vigneshvignbalakvaragan
authored
Adding dynamic thread-setting logic for CGEMM(AOCL_DYNAMIC) (#48)
- Added a set of thresholds(based on input dimensions) that determine and set the ideal number of threads to be used for CGEMM (on ZEN4 and ZEN5 architectures). - The thread-setting logic is as follows : - The underlying kernels(single-threaded) work on blocks of MRxk of A, kxNR of B and MRxNR of C. Thus, it is initially assumed that the optimal number of threads is ceil(m/MR)*ceil(n/NR). This is the upper bound on the actual number of threads that is ideal. - The actual ideal thread count could be lesser than the upper bound, based on the work that every thread receives. This is mainly determined by the value of 'k'. - If 'k' is small, the arithmetic intensity(AI) is low and memory bandwidth becomes the limiting factor, thus favoring smaller thread counts. In contrast, if 'k' is high, the AI is high and the workload scales well with higher thread counts. - So, we limit the number of threads when 'k' is small to avoid bandwidth contention. Using fewer threads ensures each thread gets more bandwidth, improving efficiency. In contrast, we allow more threads when 'k' is large, as the computation becomes more compute-bound and less limited by memory bandwidth, thereby benefitting with a higher-thread count. - The new logic will now set the upper bound for the optimal number of threads (based on the number of tiles), and then further reduce it based on the values of 'm', 'n' and 'k'. This comes under the 'AOCL_DYNAMIC' feature for CGEMM, specifically for ZEN4 and ZEN5 architectures. AMD-Internal: [CPUPL-6498] Co-authored-by: Vignesh Balasubramanian <[email protected]> Co-authored-by: Varaganti, Kiran <[email protected]>
1 parent c81408c commit 15c44a6

File tree

1 file changed

+686
-0
lines changed

1 file changed

+686
-0
lines changed

0 commit comments

Comments
 (0)