You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|2024.04|🔥[**HOMER**] Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs(@KAIST)|[[pdf]](https://arxiv.org/abs/2404.10308)|[[HOMER]](https://github.com/alinlab/HOMER)|⭐️⭐️ |
381
382
|2024.05|🔥🔥[**YOCO**] You Only Cache Once: Decoder-Decoder Architectures for Language Models(@Microsoft)|[[pdf]](https://arxiv.org/pdf/2405.05254)|[[unilm-YOCO]](https://github.com/microsoft/unilm/tree/master/YOCO)|⭐️⭐️ |
382
383
|2024.05|🔥🔥[SKVQ] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models(@Shanghai AI Laboratory)|[[pdf]](https://arxiv.org/pdf/2405.06219)| ⚠️ |⭐️⭐️ |
@@ -447,6 +450,7 @@ python3 download_pdfs.py # The code is generated by Doubao AI
447
450
|2023.12|[PowerInfer] PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU(@SJTU)|[[pdf]](https://ipads.se.sjtu.edu.cn/_media/publications/powerinfer-20231219.pdf)|[[PowerInfer]](https://github.com/SJTU-IPADS/PowerInfer)|⭐️ |
448
451
|2024.01|[**Admm Pruning**] Fast and Optimal Weight Update for Pruned Large Language Models(@fmph.uniba.sk)|[[pdf]](https://arxiv.org/pdf/2401.02938.pdf)|[[admm-pruning]](https://github.com/fmfi-compbio/admm-pruning)|⭐️ |
449
452
|2024.01|[FFSplit] FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference(@1Rice University etc) |[[pdf]](https://arxiv.org/pdf/2401.04044.pdf)| ⚠️ |⭐️|
453
+
|2025.03|🔥[**Simba**] Sparsified State-Space Models are Efficient Highway Networks(@KAIST)|[[pdf]](https://arxiv.org/abs/2505.20698)|[[Simba]](https://github.com/woominsong/Simba)|⭐️ |
0 commit comments