-
Notifications
You must be signed in to change notification settings - Fork 269
Open
Description
Currently weight updates are calculated on Native backend. Profiling shows that about 40% of CPU time is spent doing corresponding BLAS operations. Another 40% are in an area without debug info, quite likely that's nvidia driver doing i/o. In the same time according to nvidia-smi GPU load is about 20% even on my relatively slow GTX 960.
I think it's possible to get 3x-5x speedup if weight updates are implemented on GPU. It should be quite easy since update is a simple BLAS operation y = a * x + b * y where a and b are scalars, x and y are tensors of equal dimensions.
Metadata
Metadata
Assignees
Labels
No labels