-
Notifications
You must be signed in to change notification settings - Fork 164
feat(BA-2606): Add subagent layer to agents #6268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that this code was Claude Code-assisted, so there may be some imperfections that I've missed. Please let me know if you find any issues
This change introduces a new abstraction layer named 'subagents', which is used to emulate multiple agent backend instances within the same agent deployment. Users are now able to specify sub_agents in the unified configuration, which enables the user to list the configurations of subagent instances intended to be spawned. The agent server automatically handles creating the agent instances according to the configuration and routing RPC calls appropriately. This current change does not yet fully handle routing RPC calls to the correct subagent. If kernel ID, image name etc. are included as part of the arguments of the RPC call, the requests will be directed to the correct subagent. Otherwise it currently routes the requests to the default agent, which in this change is defined to be the first subagent defined in the configuration. Including subagent ID in the RPC request is future work.
This change adds support for generating TOML array of tables in the sample config generator, which is required for generating subagents configuration. This change also fixes some subtle bugs with not including the descriptions of certain optional fields. Note that the sample_generator.py file has been vibe coded with Claude Code, so maybe there are some subtle bugs with the generation. Care has been taken to ensure that the code does generate correct config file.
This change introduces OverridableContainerConfig class to represent container configs that should be overridable by subagents.
This change fixes an uncaught error with agent creation, where the type of the config passed into the constructor was separated AgentGlobalConfig and AgentSpecificConfig, rather than AgentUnifiedConfig. This was a remnant of an older version of this change, where the constructor type of the AbstractAgent was modified which broke some implicit contracts of subclasses and how they used config objects (especially with dumping the config in pickle).
This change introduces a change where all DockerAgents now share a single global instance of MetadataServer, as Docker agent instances should not create their own MetadataServer, as it will lead to unintended resource contentions if multiple DockerAgent instances are created.
6955400
to
1ed7447
Compare
"registry": { | ||
str(kern_id): _ensure_serializable(kern.__getstate__()) | ||
for kern_id, kern in self.agent.kernel_registry.items() | ||
for agent in self.agents.values() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here (and below on line 518), when creating the snapshot object, I flatten out the kernel registry and allocs stored across all the subagents. Is this an acceptable thing to do?
@HyeockJinKim @achimnol
for agent_config in agent_configs | ||
] | ||
agents = [task.result() for task in tasks] | ||
self._default_agent_id = agents[0].id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I choose the default agent to be the first agent defined in the subagents config. Is this acceptable?
This change fixes a bug where the RPC calls for destroying kernels and purging containers did not return proper responses, leading incorrect behaviors like infinite retry cycles of kernel destruction or unexpected exceptions while purging non-existent containers.
I tested locally by following the steps below. The steps are deliberately significantly more detailed than necessary, which is done mainly for posterity.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new subagent abstraction layer to enable multiple agent backend instances within the same agent deployment. Users can now specify sub-agent configurations in the unified configuration, allowing the agent server to automatically create agent instances according to the configuration and route RPC calls appropriately.
- Adds new configuration schema for defining multiple subagents with inheritance from global defaults
- Implements agent selection and routing logic in the RPC server to find appropriate agents by kernel ID, image name, etc.
- Refactors agent initialization to support multiple agent instances with shared metadata server
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
tests/agent/test_config_validation.py | Comprehensive tests for subagent configuration validation and inheritance patterns |
tests/agent/test_config_server.py | Tests for agent RPC server multi-agent mode functionality and agent selection logic |
src/ai/backend/common/configs/sample_generator.py | Enhanced TOML config generator to support array of tables syntax and runtime field handling |
src/ai/backend/agent/server.py | Major refactor to support multiple agents with routing logic and shared metadata server |
src/ai/backend/agent/docker/agent.py | Updates to support shared metadata server across multiple agent instances |
src/ai/backend/agent/config/unified.py | New configuration schema with subagent support and global/specific config separation |
configs/agent/sample.toml | Updated sample configuration file with new subagent structure |
changes/6268.feature.md | Feature changelog entry |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
This change removes the non-determinism in generating sample configs by always sorting list-like attribute values to remove the non-determinism when sample values are provided as a set, which does not provide a consistent ordering of values.
This PR got quite bloated, and follow-up PR for updating RPC functions will actually undo quite a lot of the change made in this PR. I'll create new tickets/issues that better match the new agent runtime <-> agent design, and create new PRs appropriately that separate out changes better. |
resolves #6144 (BA-2606)
This change introduces a new abstraction layer named 'subagents', which is used to emulate multiple agent backend instances within the same agent deployment.
Users are now able to specify sub_agents in the unified configuration, which enables the user to list the configurations of subagent instances intended to be spawned. The agent server automatically handles creating the agent instances according to the configuration and routing RPC calls appropriately.
This current change does not yet fully handle routing RPC calls to the correct subagent. If kernel ID, image name etc. are included as part of the arguments of the RPC call, the requests will be directed to the correct subagent. Otherwise it currently routes the requests to the default agent, which in this change is defined to be the first subagent defined in the configuration.
Including subagent ID in the RPC request is future work.
Checklist: (if applicable)
ai.backend.test
docs
directory