Awesome Mobile LLMs

A curated list of LLMs and related studies targeted at mobile and embedded hardware

Last update: 19th October 2025

If your publication/work is not included - and you think it should - please open an issue or reach out directly to @stevelaskaridis.

Let's try to make this list as useful as possible to researchers, engineers and practitioners all around the world.

Mobile-First LLMs
Infrastructure / Deployment of LLMs on Device
Benchmarking LLMs on Device
Mobile-Specific Optimisations
Applications
Multimodal LLMs
Surveys on Efficient LLMs
Training LLMs on Device
Mobile-Related Use-Cases
Benchmarks
Leaderboards
Industry Announcements
Books/Courses
Related Organized Workshops
Related Awesome Repositories

Mobile-First LLMs

The following Table shows sub-3B models designed for on-device deployments, sorted by year.

Name	Year	Sizes	Primary Group/Affiliation	Publication	Code Repository	HF Repository
2025
MobileLLM-Pro	2025	1B	Meta	-	-	huggingface
MobileLLM-R1	2025	140M, 360M, 950M	Meta	paper	code	huggingface
SmolLM3	2025	3B	HuggingFace	blog	code	huggingface
Qwen-3	2025	0.6B, 1.7B, ...	Qwen Team	paper	code	huggingface
Pareto-Q	2025	125M, 350M, 600M, 1B, 1.5B, 3B	Meta	paper	code	huggingface
2024
BlueLM-V	2024	2.7B	CUHK, Vivo AI Lab	paper	code	-
PhoneLM	2024	0.5B, 1.5B	BUPT	paper	code	huggingface
AMD-Llama-135m	2024	135M	AMD	blog	code	huggingface
SmolLM2	2024	135M, 360M, 1.7B	Huggingface	-	code	huggingface
Ministral	2024	3B, ...	Mistral	blog	-	huggingface
Llama 3.2	2024	1B, 3B	Meta	blog	code	huggingface
OLMoE	2024	7B (1B active)	AllenAI	paper	code	huggingface
Spectra	2024	99M - 3.9B	NolanoAI	paper	code	huggingface
Gemma 2	2024	2B, ...	Google	paper blog	code	huggingface
Apple Intelligence Foundation LMs	2024	3B	Apple	paper	-	-
SmolLM	2024	135M, 360M, 1.7B	Huggingface	blog	-	huggingface
Fox	2024	1.6B	TensorOpera	blog	-	huggingface
Qwen2	2024	500M, 1.5B, ...	Qwen Team	paper	code	huggingface
OpenELM	2024	270M, 450M, 1.08B, 3.04B	Apple	paper	code	huggingface
DCLM	2024	400M, 1B, ...	Univerisy of Washington, Apple, Toyota Research Institute, ...	paper	code	huggingface
Phi-3	2024	3.8B	Microsoft	whitepaper	code	huggingface
BitNet-b1.58	2024	1.3B, 3B, ...	Microsoft	paper	code	huggingface
OLMo	2024	1B, ...	AllenAI	paper	code	huggingface
Mobile LLMs	2024	125M, 250M	Meta	paper	code	-
Gemma	2024	2B, ...	Google	paper, website	code, gemma.cpp	huggingface
MobiLlama	2024	0.5B, 1B	MBZUAI	paper	code	huggingface
Stable LM 2 (Zephyr)	2024	1.6B	Stability.ai	paper	-	huggingface
TinyLlama	2024	1.1B	Singapore University of Technology and Design	paper	code	huggingface
Gemini-Nano	2024	1.8B, 3.25B	Google	paper	-	-
2023
Stable LM (Zephyr)	2023	3B	Stability	blog	code	huggingface
OpenLM	2023	11M, 25M, 87M, 160M, 411M, 830M, 1B, 3B, ...	OpenLM team	-	code	huggingface
Phi-2	2023	2.7B	Microsoft	website	-	huggingface
Phi-1.5	2023	1.3B	Microsoft	paper	-	huggingface
Phi-1	2023	1.3B	Microsoft	paper	-	huggingface
RWKV	2023	169M, 430M, 1.5B, 3B, ...	EleutherAI	paper	code	huggingface
Cerebras-GPT	2023	111M, 256M, 590M, 1.3B, 2.7B ...	Cerebras	paper	code	huggingface
OPT	2022	125M, 350M, 1.3B, 2.7B, ...	Meta	paper	code	huggingface
LaMini-LM	2023	61M, 77M, 111M, 124M, 223M, 248M, 256M, 590M, 774M, 738M, 783M, 1.3B, 1.5B, ...	MBZUAI	paper	code	huggingface
Pythia	2023	70M, 160M, 410M, 1B, 1.4B, 2.8B, ...	EleutherAI	paper	code	huggingface
2022
Galactica	2022	125M, 1.3B, ...	Meta	paper	code	huggingface
BLOOM	2022	560M, 1.1B, 1.7B, 3B, ...	BigScience	paper	code	huggingface
2021
XGLM	2021	564M, 1.7B, 2.9B, ...	Meta	paper	code	huggingface
GPT-Neo	2021	125M, 350M, 1.3B, 2.7B	EleutherAI	-	code, gpt-neox	huggingface
2020
MobileBERT	2020	15.1M, 25.3M	CMU, Google	paper	code	huggingface
2019
BART	2019	140M, 400M	Meta	paper	code	huggingface
DistilBERT	2019	66M	HuggingFace	paper	code	huggingface
T5	2019	60M, 220M, 770M, 3B, ...	Google	paper	code	huggingface
TinyBERT	2019	14.5M	Huawei	paper	code	huggingface
Megatron-LM	2019	336M, 1.3B, ...	Nvidia	paper	code	-

Infrastructure / Deployment of LLMs on Device

This section showcases frameworks and contributions for supporting LLM inference on mobile and edge devices.

Deployment Frameworks

llama.cpp: Inference of Meta's LLaMA model (and others) in pure C/C++. Supports various platforms and builds on top of ggml (now gguf format).
- LLMFarm: iOS frontend for llama.cpp
- LLM.swift: iOS frontend for llama.cpp
- Sherpa: Android frontend for llama.cpp
- iAkashPaul/Portal: Wraps the example android app with tweaked UI, configs & additional model support
- dusty-nv's llama.cpp: Containers for Jetson deployment of llama.cpp
MLC-LLM: MLC LLM is a machine learning compiler and high-performance deployment engine for large language models. Supports various platforms and build on top of TVM.
- Android App: MLC Android app
- iOS App: MLC iOS app
- dusty-nv's MLC: Containers for Jetson deployment of MLC
PyTorch ExecuTorch: Solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers.
- TorchChat: Codebase showcasing the ability to run large language models (LLMs) seamlessly across iOS and Android
Google MediaPipe: A suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications. Support Android, iOS, Python and Web.
- GoogleAI-Edge Gallery: Experimental app that puts the power of cutting-edge Generative AI models directly into your hands, running entirely on your Android and iOS devices.
Apple MLX: MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research. Builds upon lazy evaluation and unified memory architecture.
- MLX Swift: Swift API for MLX.
HF Swift Transformers: Swift Package to implement a transformers-like API in Swift
Alibaba MNN: MNN supports inference and training of deep learning models and for inference and training on-device.
llama2.c (More educational, see here for android port)
tinygrad: Simple neural network framework from tinycorp and @geohot
TinyChatEngine: Targeted at Nvidia, Apple M1 and RPi, from Song Han's (MIT) group.
Llama Stack (swift, kotlin): These libraries are a set of SDKs that provide a simple and effective way to integrate AI capabilities into your iOS/Android app, whether it is local (on-device) or remote inference.
OLMoE.Swift: Ai2 OLMoE is an AI chatbot powered by the OLMoE model. Unlike cloud-based AI assistants, OLMoE runs entirely on your device, ensuring complete privacy and offline accessibility—even in Flight Mode.
HuggingSnap: HuggingSnap is an iOS app that lets users quickly learn more about the places and objects around them. HuggingSnap runs SmolVLM2, a compact open multimodal model that accepts arbitrary sequences of image, videos, and text inputs to produce text outputs.
Flower Intelligence: Flower Intelligence is a cross-platform inference library that lets users seamlessly interact with Large-Language Models both locally and remotely in a secure and private way. The library was created by the Flower Labs team. It supports TypeScript, JavaScript and Swift backends.

Papers

2025

Apple Intelligence Foundation Language Models: Tech Report 2025 (paper)
[ACM Queue] Generative AI at the Edge: Challenges and Opportunities: The next phase in AI deployment (paper)

2024

PowerInfer-2: Fast Large Language Model Inference on a Smartphone (paper, code)
[MobiCom'24] Mobile Foundation Model as Firmware (paper, code)
Merino: Entropy-driven Design for Generative Language Models on IoT Devicess (paper)
LLM as a System Service on Mobile Devices (paper)

2023

LLMCad: Fast and Scalable On-device Large Language Model Inference (paper)
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models (paper)

2022

[IEEE Pervasive Computing] The Future of Consumer Edge-AI Computing (paper, talk)

Benchmarking LLMs on Device

This section focuses on measurements and benchmarking efforts for assessing LLM performance when deployed on device.

Papers

2025

[ICLR'25] PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms (paper)

2024

Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation (paper)
[EdgeFM @ MobiSys'24] Large Language Models on Mobile Devices: Measurements, Analysis, and Insights (paper)
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases (paper)
[MobiCom'24] MELTing point: Mobile Evaluation of Language Transformers (paper, talk, code)

Mobile-Specific Optimisations

This section focuses on techniques and optimisations that target mobile-specific deployment.

Papers

2025

[CVPR'25 EDGE Workshop] Scaling On-Device GPU Inference for Large Generative Models (paper)
ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM (paper)
[ASPLOS'25] Fast On-device LLM Inference with NPUs (paper, code)

2024

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference (paper)
PhoneLM: An Efficient and Capable Small Language Model Family through Principled Pre-training (paper, code)
MobileQuant: Mobile-friendly Quantization for On-device Language Models (paper, code)
Gemma 2: Improving Open Language Models at a Practical Size (paper, code)
Apple Intelligence Foundation Language Models (paper)
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone (paper, code)
Gemma: Open Models Based on Gemini Research and Technology (paper, code)
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT (paper, code)
[ICML'24] MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (paper, code)
[ICML'24] Rethinking Optimization and Architecture for Tiny Language Models (paper, code)
TinyLlama: An Open-Source Small Language Model (paper, code)

Applications

Papers

2024

Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent (paper)
Octopus v2: On-device language model for super agent (paper)

2023

Towards an On-device Agent for Text Rewriting (paper)

Multimodal LLMs

This section refers to multimodal LLMs, which integrate vision or other modalities in their tasks.

Papers

2024

[CVPR 2024] MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training (paper)
TinyLLaVA: A Framework of Small-scale Large Multimodal Models (paper, code)
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model (paper, code)

2023

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices (paper, code)

Surveys on Efficient LLMs

This section includes survey papers on LLM efficiency, a topic very much related to deploying in constrained devices.

Papers

2025

GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices (paper)
Small Language Models (SLMs) Can Still Pack a Punch: A survey (paper)

2024

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness (paper)
Small Language Models: Survey, Measurements, and Insights (paper)
On-Device Language Models: A Comprehensive Review (paper)
A Survey of Resource-efficient LLM and Multimodal Foundation Models (paper)

2023

Efficient Large Language Models: A Survey (paper, code)
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems (paper)
A Survey on Model Compression for Large Language Models (paper)

Training LLMs on Device

This section refers to papers attempting to train/fine-tune LLMs on device, in a standalone or federated manner.

Papers

2025

Computational Bottlenecks of Training Small-scale Large Language Models paper
**[ICML'25]**On-device collaborative language modeling via a mixture of generalists and specialists (paper)
MobiLLM: Enabling LLM Fine-Tuning on the Mobile Device via Server Assisted Side Tuning (paper)

2024

[Privacy in Natural Language Processing @ ACL'24] PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs (paper)

2023

[MobiCom'23] Federated Few-Shot Learning for Mobile NLP (paper, code)
FwdLLM: Efficient FedLLM using Forward Gradient (paper, code)
[Electronics'24] Forward Learning of Large Language Models by Consumer Devices (paper)
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly (paper)
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes (paper, code)

Mobile-Related Use-cases

This section includes paper that are mobile-related, but not necessarily run on device.

Papers

2025

Small Language Models are the Future of Agentic AI (paper)

2024

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs (paper)
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception (paper, code)
[MobiCom'24] MobileGPT: Augmenting LLM with Human-like App Memory for Mobile Task Automation (paper)
[MobiCom'24] AutoDroid: LLM-powered Task Automation in Android (paper, code)

2023

[NeurIPS'23] AndroidInTheWild: A Large-Scale Dataset For Android Device Control (paper, code)
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation (paper, code)

Older

[ACL'20] Mapping Natural Language Instructions to Mobile UI Action Sequences (paper)

Benchmarks

Leaderboards

Books and Courses

Edge AI Engineering by Marcelo Rovai
Machine Learning Systems: Principles and Practices of Engineering Artificially Intelligent Systems by Vijay Janapa Reddi

Industry Announcements

Related Organized Workshops

TTODLer-FM @ ICML'25: Tiny Titans: The next wave of On-Device Learning for Foundational Models (TTODLer-FM)
ES-FoMO @ ICML'25: Efficient Systems for Foundation Models
Binary Networks @ ICCV'25: Binary and Extreme Quantization for Computer Vision
SLLM @ ICLR'25: Workshop on Sparsity in LLMs: Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference
MCDC @ ICLR'25: Workshop on Modularity for Collaborative, Decentralized, and Continual Deep Learning
Adaptive Foundation Models @ NeurIPS'24

Related Awesome Repositories

If you want to read more about related topics, here are some tangential awesome repositories to visit:

NexaAI/Awesome-LLMs-on-device on LLMs on Device
Hannibal046/Awesome-LLM on Large Language Models
KennethanCeyer/awesome-llm on Large Language Models
HuangOwen/Awesome-LLM-Compression on Large Language Model Compression
csarron/awesome-emdl on Embedded and Mobile Deep Learning

Contribute

Contributions welcome! Read the contribution guidelines first.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
contributing.md		contributing.md

Uh oh!

License

Uh oh!

stevelaskaridis/awesome-mobile-llm

Folders and files

Latest commit

History

Repository files navigation

Awesome Mobile LLMs

Contents

Mobile-First LLMs

Infrastructure / Deployment of LLMs on Device

Deployment Frameworks

Papers

2025

2024

2023

2022

Benchmarking LLMs on Device

Papers

2025

2024

Mobile-Specific Optimisations

Papers

2025

2024

Applications

Papers

2024

2023

Multimodal LLMs

Papers

2024

2023

Surveys on Efficient LLMs

Papers

2025

2024

2023

Training LLMs on Device

Papers

2025

2024

2023

Mobile-Related Use-cases

Papers

2025

2024

2023

Older

Benchmarks

Leaderboards

Books and Courses

Industry Announcements

Related Organized Workshops

Related Awesome Repositories

Contribute

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages