-
Notifications
You must be signed in to change notification settings - Fork 34
Closed as not planned
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is neededrefactorCan be written more cleanlyCan be written more cleanly
Description
With some of these I'm concerned about readability. I'd prefer to sacrifice a little bit of performance loss for readability. Also I'd like more benchmarks (which are rather minimal as of writing this) to evaluate the effectiveness of these changes.
Some initial ideas for optimisation:
- Change global allocator to http://jemalloc.net/
- Use uninitialised memory
- Cache FFT twiddle and scale buffers
- Branchless GPU kernels (MulPow is a prime candidate)
- Use async with GPU calls for CPU+GPU parallelism
Would be great to hear other ideas
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is neededrefactorCan be written more cleanlyCan be written more cleanly