Custom Agent

Add custom agent for evaluation

Web-Bench supports custom agent capabilities by invoking your custom agent API through the built-in 'http' agent mode.

Before that, you need to complete the installation.

During the "Call Agent" step, the agent will:

Pass the Evaluator's context to your agent.
Return the your anget's response to the Evaluator without modifications.

Therefore, the request and response formats of your CustomAgent must adhere to the following interfaces:

export interface AgentRequest {
  type: 'normal' | 'init'

  task: string

  // Code files, key is filePath, value is fileContent
  files?: Record<string, string>

  // Error context
  error?: string
}


export interface AgentResponse {
  // Code files, key is filePath, value is fileContent
  files: Record<string, string>

  // [filePath:string]: string  Poor Extension
}

Evaluation | arXiv Paper | Leaderboard

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom Agent

Add custom agent for evaluation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally