Speed and memory optimization ideas

With some of these I'm concerned about readability. I'd prefer to sacrifice a little bit of performance loss for readability. Also I'd like more benchmarks (which are rather minimal as of writing this) to evaluate the effectiveness of these changes.

Some initial ideas for optimisation:

- Change global allocator to http://jemalloc.net/
- Use uninitialised memory
- Cache FFT twiddle and scale buffers
- Branchless GPU kernels (MulPow is a prime candidate)
- Use async with GPU calls for CPU+GPU parallelism

Would be great to hear other ideas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed and memory optimization ideas #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Speed and memory optimization ideas #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions