Haiyang Mei, Difei Gao, Xiaopeng Wei, Xin Yang, Mike Zheng Shou
TrustScorer evaluates the trustworthiness of GUI agent actions for selective human intervention when action trust score is low, to help mingling human precision with AI efficiency.
TrustScorer takes as input the user query q, subtask description d, action sequence s, and state observation o, and outputs a trustworthiness label l indicating the likelihood that the action sequence can accomplish the specified subtask
TrustBench includes 106 specific tasks from 9 commonly used applications as well as 718 agent action sequences along with the corresponding ground-truth annotations.
One TrustBench example on PPT:
The annotation pipeline:
The TrustBench will be released at December 2025.
We will release the training/testing/evaluation codes around the end of November 2025.
Our work builds upon AssistGUI.
If you use TrustScorer/TrustBench in your research, please use the following BibTeX entry.
@InProceedings{Mei_2025_MM_TrustScorer,
author = {Mei, Haiyang and Gao, Difei and Wei, Xiaopeng and Yang, Xin and Shou, Mike Zheng},
title = {Can I Trust You? Advancing GUI Task Automation with Action Trust Score},
booktitle = {Proceedings of the 33rd ACM International Conference on Multimedia (ACM MM)},
year = {2025},
}
Please see LICENSE
E-Mail: Haiyang Mei ([email protected])