Skip to content

Speed and memory optimization ideas #8

@andrewmilson

Description

@andrewmilson

With some of these I'm concerned about readability. I'd prefer to sacrifice a little bit of performance loss for readability. Also I'd like more benchmarks (which are rather minimal as of writing this) to evaluate the effectiveness of these changes.

Some initial ideas for optimisation:

  • Change global allocator to http://jemalloc.net/
  • Use uninitialised memory
  • Cache FFT twiddle and scale buffers
  • Branchless GPU kernels (MulPow is a prime candidate)
  • Use async with GPU calls for CPU+GPU parallelism

Would be great to hear other ideas

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededrefactorCan be written more cleanly

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions