Skip to content

[Help Wanted] Optimize binary matmul kernel #5

@chromecast56

Description

@chromecast56

The current kernel is fairly slow compared to the theoretical optimum, considering the small memory footprint of weight deltas. So right now it functions more as as proof of concept (eg. it outperforms naive simultaneous inference). Can expect an additional 4-8x latency improvement if further optimized.

I don't have much kernel optimization experience yet, though - if anyone in the OSS community is interested, would love some help!

Afterwards, it'd be super interesting to run some benchmarks against LoRA-based multi-tenant systems like Punica/S-LoRA.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions