-
Notifications
You must be signed in to change notification settings - Fork 17
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
The current kernel is fairly slow compared to the theoretical optimum, considering the small memory footprint of weight deltas. So right now it functions more as as proof of concept (eg. it outperforms naive simultaneous inference). Can expect an additional 4-8x latency improvement if further optimized.
I don't have much kernel optimization experience yet, though - if anyone in the OSS community is interested, would love some help!
Afterwards, it'd be super interesting to run some benchmarks against LoRA-based multi-tenant systems like Punica/S-LoRA.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request