Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment

Gaole Dai^1,2, Shiqi Jiang¹, Ting Cao¹, Yuanchun Li³, Yuqing Yang¹, Rui Tan², Mo Li⁴, Lili Qiu¹

¹ Microsoft Research ² Nanyang Technological University ³ AIR @ Tsinghua University ⁴ Hong Kong University of Science & Technology

Overview

Introducing V-Droid – the first mobile GUI agent with near-real-time, high-quality decision making ability. Unlike traditional agents that rely on large language models (LLMs) to generate actions at every step, V-Droid employes LLMs as verifiers evaluating candidate actions to ensure high-quality decision-making.

V-Droid features:

Discretized Action Space & Prefilling-Only Workflow: Accelerates decision-making by verifying candidate actions in parallel using prefix caching.
Pair-Wise Progress Preference Training: Enhances the verifier’s decision-making and self-correction capabilities through progress-aware training.
Scalable Human-Agent Joint Annotation: V-Droid quickly takes the lead role in the annotation process after just two training rounds, significantly reducing overhead while boosting performance.

V-Droid has set new benchmarks in mobile tasks automation, achieving state-of-the-art task success rates of 59.5% on AndroidWorld, 38.3% on AndroidLab, and 49% on MobileAgentBench, outperforming existing agents by 9.5%, 2.1%, and 9%, respectively. Furthermore, V-Droid achieves an low latency of 0.7 seconds per decision, which is 32.8X faster than existing agents.

Code & Weights

The complete codebase and model weights will be released shortly—stay tuned!

Citation

If you use this work, please cite:

@article{dai2025advancing,
  title={Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment},
  author={Dai, Gaole and Jiang, Shiqi and Cao, Ting and Li, Yuanchun and Yang, Yuqing and Tan, Rui and Li, Mo and Qiu, Lili},
  journal={arXiv preprint arXiv:2503.15937},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
static		static
README.md		README.md
index.html		index.html
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment

Overview

Code & Weights

Citation

About

Uh oh!

Releases

Packages

Languages

V-Droid-Agent/V-Droid-Agent.github.io

Folders and files

Latest commit

History

Repository files navigation

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment

Overview

Code & Weights

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages