Skip to content

showlab/TrustScorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ACM MM 2025: Can I Trust You? Advancing GUI Task Automation with Action Trust Score

Show Lab @ NUS

Haiyang Mei, Difei Gao, Xiaopeng Wei, Xin Yang, Mike Zheng Shou

[Paper] [BibTeX]

1. TrustScorer

TrustScorer evaluates the trustworthiness of GUI agent actions for selective human intervention when action trust score is low, to help mingling human precision with AI efficiency.

TrustScorer takes as input the user query q, subtask description d, action sequence s, and state observation o, and outputs a trustworthiness label l indicating the likelihood that the action sequence can accomplish the specified subtask

2. TrustBench

TrustBench includes 106 specific tasks from 9 commonly used applications as well as 718 agent action sequences along with the corresponding ground-truth annotations.

One TrustBench example on PPT:

The annotation pipeline:

The TrustBench will be released at December 2025.

3. Implementation

We will release the training/testing/evaluation codes around the end of November 2025.

4. Acknowledgements

Our work builds upon AssistGUI.

5. Citation

If you use TrustScorer/TrustBench in your research, please use the following BibTeX entry.

@InProceedings{Mei_2025_MM_TrustScorer,
    author    = {Mei, Haiyang and Gao, Difei and Wei, Xiaopeng and Yang, Xin and Shou, Mike Zheng},
    title     = {Can I Trust You? Advancing GUI Task Automation with Action Trust Score},
    booktitle = {Proceedings of the 33rd ACM International Conference on Multimedia (ACM MM)},
    year      = {2025},
}

6. License

Please see LICENSE

7. Contact

E-Mail: Haiyang Mei ([email protected])

⬆ back to top

About

ACM MM 2025 Can I Trust You? Advancing GUI Task Automation with Action Trust Score

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published