-
Notifications
You must be signed in to change notification settings - Fork 1.2k
img diff algo for screenshots #1072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
img diff algo for screenshots #1072
Conversation
🦋 Changeset detectedLatest commit: ac4cb00 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR replaces the existing time-based screenshot service with an intelligent image-diff based approach that uses MSE (Mean Squared Error) and SSIM (Structural Similarity Index) algorithms to determine if UI changes occurred. The new system intercepts agent screenshots and only stores them if they differ significantly from previous captures, reducing storage overhead while maintaining evaluation quality.
Key changes:
• Added Sharp dependency for image processing and comparison algorithms
• Implemented screenshot interception that overrides the page.screenshot method
• Added MSE/SSIM-based image comparison with configurable thresholds
• Introduced optimized screenshot capture with viewport resizing for performance
• Changed default trial count from 3 to 1 for faster evaluation cycles
Confidence score: 3/5
- This PR has moderate risk due to potential race conditions from viewport manipulation
- Score reflects solid implementation of image comparison algorithms but critical issue with concurrent viewport modification that could cause unpredictable behavior in multi-threaded scenarios
- Pay close attention to evals/utils/imageResize.ts - the viewport manipulation approach needs revision
6 files reviewed, 1 comment
Co-authored-by: Miguel <[email protected]>
691334d to
9c5f69a
Compare
evals/evaluator.ts
Outdated
| import { LLMParsedResponse } from "@/lib/inference"; | ||
| import { LLMResponse } from "@/lib/llm/LLMClient"; | ||
| import { LogLine } from "@/types/log"; | ||
| import { z } from "zod"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zod3
15f737a to
4d9ecf2
Compare
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/[email protected] ### Patch Changes - [#1082](#1082) [`8c0fd01`](8c0fd01) Thanks [@tkattkat](https://github.com/tkattkat)! - Pass stagehand object to agent instead of stagehand page - [#1104](#1104) [`a1ad06c`](a1ad06c) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix logging for stagehand agent - [#1066](#1066) [`9daa584`](9daa584) Thanks [@tkattkat](https://github.com/tkattkat)! - Add playwright arguments to agent execute response - [#1077](#1077) [`7f38b3a`](7f38b3a) Thanks [@tkattkat](https://github.com/tkattkat)! - adds support for stagehand agent in the api - [#1032](#1032) [`bf2d0e7`](bf2d0e7) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix for zod peer dependency support - [#1014](#1014) [`6966201`](6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - Replace operator handler with base of new agent - [#1089](#1089) [`536f366`](536f366) Thanks [@miguelg719](https://github.com/miguelg719)! - Fixed info logs on api session create - [#1103](#1103) [`889cb6c`](889cb6c) Thanks [@tkattkat](https://github.com/tkattkat)! - patch custom tool support in anthropic cua client - [#1056](#1056) [`6a002b2`](6a002b2) Thanks [@chrisreadsf](https://github.com/chrisreadsf)! - remove need for duplicate project id if already passed to Stagehand - [#1090](#1090) [`8ff5c5a`](8ff5c5a) Thanks [@miguelg719](https://github.com/miguelg719)! - Improve failed act error logs - [#1014](#1014) [`6966201`](6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - replace operator agent with scaffold for new stagehand agent - [#1107](#1107) [`3ccf335`](3ccf335) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: url extraction not working inside an array - [#1102](#1102) [`a99aa48`](a99aa48) Thanks [@miguelg719](https://github.com/miguelg719)! - Add current page and date context to agent - [#1110](#1110) [`dda52f1`](dda52f1) Thanks [@miguelg719](https://github.com/miguelg719)! - Add support for new Gemini Computer Use models ## @browserbasehq/[email protected] ### Minor Changes - [#1057](#1057) [`b7be89e`](b7be89e) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - added web voyager ground truth (optional), added web bench, and subset of OSWorld evals which run on a browser ### Patch Changes - [#1072](#1072) [`dc2d420`](dc2d420) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - improve evals screenshot service - add img hashing diff to add screenshots and change to screenshot intercepts from the agent - Updated dependencies \[[`8c0fd01`](8c0fd01), [`a1ad06c`](a1ad06c), [`9daa584`](9daa584), [`7f38b3a`](7f38b3a), [`bf2d0e7`](bf2d0e7), [`6966201`](6966201), [`536f366`](536f366), [`889cb6c`](889cb6c), [`6a002b2`](6a002b2), [`8ff5c5a`](8ff5c5a), [`6966201`](6966201), [`3ccf335`](3ccf335), [`a99aa48`](a99aa48), [`dda52f1`](dda52f1)]: - @browserbasehq/[email protected] ## @browserbasehq/[email protected] ### Patch Changes - Updated dependencies \[[`8c0fd01`](8c0fd01), [`a1ad06c`](a1ad06c), [`9daa584`](9daa584), [`7f38b3a`](7f38b3a), [`bf2d0e7`](bf2d0e7), [`6966201`](6966201), [`536f366`](536f366), [`889cb6c`](889cb6c), [`6a002b2`](6a002b2), [`8ff5c5a`](8ff5c5a), [`6966201`](6966201), [`3ccf335`](3ccf335), [`a99aa48`](a99aa48), [`dda52f1`](dda52f1)]: - @browserbasehq/[email protected] Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
why
Our existing screenshot service is a dummy time-based triggered service. It also does not trigger based on any actions of the agent.
what changed
Added img hash diff algo (quick check with MSE, verify with SSIM algo) to see if there was an actual UI change and only store ss in the buffer if that is so.
Added ss interceptor which copies each screenshot the agent is taking to a buffer (if different enough from the previous ss) to be later used for evals.
test plan
Tests pass locally