Skip to content

Commit 4149e38

Browse files
feat(langchain/createAgent): add todo middleware (#9051)
1 parent 3b4d86b commit 4149e38

File tree

2 files changed

+366
-0
lines changed

2 files changed

+366
-0
lines changed
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
import { describe, it, expect, vi, type MockInstance } from "vitest";
2+
import { AIMessage, HumanMessage } from "@langchain/core/messages";
3+
import { BaseChatModel } from "@langchain/core/language_models/chat_models";
4+
import { createAgent } from "../../index.js";
5+
import { todoMiddleware } from "../todo.js";
6+
7+
function createMockModel(name = "ChatAnthropic", modelType = "anthropic") {
8+
// Mock Chat model extending BaseChatModel
9+
const mockModel = {
10+
getName: () => name,
11+
bindTools: vi.fn().mockReturnThis(),
12+
_streamResponseChunks: vi.fn().mockReturnThis(),
13+
bind: vi.fn().mockReturnThis(),
14+
invoke: vi.fn().mockResolvedValue(new AIMessage("Response from model")),
15+
lc_runnable: true,
16+
_modelType: modelType,
17+
_generate: vi.fn(),
18+
_llmType: () => modelType,
19+
} as unknown as BaseChatModel;
20+
mockModel.withStructuredOutput = vi.fn().mockReturnValue(mockModel);
21+
22+
return mockModel;
23+
}
24+
25+
describe("todoMiddleware", () => {
26+
it("should add the system prompt to the model request", async () => {
27+
const middleware = todoMiddleware();
28+
const model = createMockModel();
29+
const agent = createAgent({
30+
model,
31+
middleware: [middleware] as const,
32+
});
33+
34+
const result = await agent.invoke({
35+
messages: [new HumanMessage("Hello, world!")],
36+
});
37+
38+
expect(result.todos).toEqual([]);
39+
const [messages] = (model.invoke as unknown as MockInstance).mock
40+
.calls[0][0];
41+
expect(messages.content).toContain("## `write_todos`\n\nYou have ");
42+
});
43+
});
Lines changed: 323 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,323 @@
1+
import { z } from "zod";
2+
import { Command } from "@langchain/langgraph";
3+
import { tool } from "@langchain/core/tools";
4+
import { ToolMessage } from "@langchain/core/messages";
5+
6+
import { createMiddleware } from "../../index.js";
7+
8+
/**
9+
* Description for the write_todos tool
10+
* Ported exactly from Python WRITE_TODOS_DESCRIPTION
11+
*/
12+
const WRITE_TODOS_DESCRIPTION = `Use this tool to create and manage a structured task list for your current work session. This helps you track progress, organize complex tasks, and demonstrate thoroughness to the user.
13+
It also helps the user understand the progress of the task and overall progress of their requests.
14+
Only use this tool if you think it will be helpful in staying organized. If the user's request is trivial and takes less than 3 steps, it is better to NOT use this tool and just do the taks directly.
15+
16+
## When to Use This Tool
17+
Use this tool in these scenarios:
18+
19+
1. Complex multi-step tasks - When a task requires 3 or more distinct steps or actions
20+
2. Non-trivial and complex tasks - Tasks that require careful planning or multiple operations
21+
3. User explicitly requests todo list - When the user directly asks you to use the todo list
22+
4. User provides multiple tasks - When users provide a list of things to be done (numbered or comma-separated)
23+
5. The plan may need future revisions or updates based on results from the first few steps. Keeping track of this in a list is helpful.
24+
25+
## How to Use This Tool
26+
1. When you start working on a task - Mark it as in_progress BEFORE beginning work.
27+
2. After completing a task - Mark it as completed and add any new follow-up tasks discovered during implementation.
28+
3. You can also update future tasks, such as deleting them if they are no longer necessary, or adding new tasks that are necessary. Don't change previously completed tasks.
29+
4. You can make several updates to the todo list at once. For example, when you complete a task, you can mark the next task you need to start as in_progress.
30+
31+
## When NOT to Use This Tool
32+
It is important to skip using this tool when:
33+
1. There is only a single, straightforward task
34+
2. The task is trivial and tracking it provides no benefit
35+
3. The task can be completed in less than 3 trivial steps
36+
4. The task is purely conversational or informational
37+
38+
## Examples of When to Use the Todo List
39+
40+
<example>
41+
User: I want to add a dark mode toggle to the application settings. Make sure you run the tests and build when you're done!
42+
Assistant: I'll help add a dark mode toggle to your application settings. Let me create a todo list to track this implementation.
43+
*Creates todo list with the following items:*
44+
1. Create dark mode toggle component in Settings page
45+
2. Add dark mode state management (context/store)
46+
3. Implement CSS-in-JS styles for dark theme
47+
4. Update existing components to support theme switching
48+
5. Run tests and build process, addressing any failures or errors that occur
49+
*Begins working on the first task*
50+
51+
<reasoning>
52+
The assistant used the todo list because:
53+
1. Adding dark mode in it of itself is a multi-step feature requiring UI, state management, and styling changes
54+
2. The assistant inferred that tests and build need to pass by adding "Ensure tests and build succeed" as the final task
55+
3. Both of the user's requests are complex and require multiple steps to complete.
56+
</reasoning>
57+
</example>
58+
59+
<example>
60+
User: Help me plan a comprehensive marketing campaign for our new product launch, including social media, email marketing, and press releases.
61+
Assistant: I'll help you plan a comprehensive marketing campaign for your product launch. Let me create a todo list to organize all the components.
62+
*Creates todo list with the following items:*
63+
1. Define target audience and key messaging
64+
2. Create social media content calendar and posting schedule
65+
3. Design email marketing sequence and automation flows
66+
4. Draft press release and identify media contacts
67+
5. Set up tracking and analytics for campaign performance
68+
6. Create visual assets and brand materials
69+
7. Coordinate launch timeline and deliverables
70+
*Begins working on the first task*
71+
72+
<reasoning>
73+
The assistant used the todo list because:
74+
1. Marketing campaign planning involves multiple distinct channels and activities
75+
2. Each component requires careful coordination and planning
76+
3. The systematic approach ensures all aspects of the launch are covered
77+
</reasoning>
78+
</example>
79+
80+
<example>
81+
User: Help me rename the function getCwd to getCurrentWorkingDirectory across my project
82+
Assistant: Let me first search through your codebase to find all occurrences of 'getCwd'.
83+
*Uses search tools to locate all instances of getCwd in the codebase*
84+
Assistant: I've found 15 instances of 'getCwd' across 8 different files. Let me create a todo list to track these changes.
85+
*Creates todo list with specific items for each file that needs updating*
86+
87+
<reasoning>
88+
The assistant used the todo list because:
89+
1. The assistant searched to understand the scope of the task
90+
2. Upon finding multiple occurrences across different files, it determined this was a complex task with multiple steps (>3)
91+
3. The todo list helps ensure every instance is tracked and updated systematically
92+
4. This approach prevents missing any occurrences and maintains consistency.
93+
</reasoning>
94+
</example>
95+
96+
<example>
97+
User: I need help organizing my home office: decluttering the desk, setting up a filing system, organizing digital files, and creating a daily workflow.
98+
Assistant: I'll help you organize your home office systematically. Let me create a todo list to track each area of organization.
99+
*Creates todo list with items like: 1) Sort and declutter desk items, 2) Set up physical filing system for documents, 3) Organize digital files and folders, 4) Create daily workflow and productivity systems, 5) Establish maintenance routines*
100+
Let's start with decluttering your desk space to create a clean foundation for the other organizational systems.
101+
102+
<reasoning>
103+
The assistant used the todo list because:
104+
1. The user provided multiple distinct organizational tasks
105+
2. Each area requires different approaches and careful planning
106+
3. Home organization benefits from systematic, step-by-step completion
107+
4. Progress tracking ensures all areas are addressed thoroughly
108+
</reasoning>
109+
</example>
110+
111+
## Examples of When NOT to Use the Todo List
112+
113+
<example>
114+
User: How do I print 'Hello World' in Python?
115+
Assistant: In Python, you can print "Hello World" with this simple code:
116+
117+
\`\`\`python
118+
print("Hello World")
119+
\`\`\`
120+
121+
This will output the text "Hello World" to the console when executed.</assistant>
122+
123+
<reasoning>
124+
The assistant did not use the todo list because this is a single, trivial task that can be completed in one step. There's no need to track multiple tasks or steps for such a straightforward request.
125+
</reasoning>
126+
</example>
127+
128+
<example>
129+
User: What's the best way to water houseplants?
130+
Assistant: The best way to water houseplants depends on the specific plant type, but here are some general guidelines:
131+
132+
- Check soil moisture by inserting your finger about 1-2 inches into the soil
133+
- Water thoroughly when the top inch feels dry, allowing water to drain from the bottom
134+
- Use room temperature water and water in the morning when possible
135+
- Avoid overwatering, which is more harmful than underwatering for most plants
136+
137+
Different plants have different needs, so it's important to research your specific varieties for optimal care.
138+
139+
<reasoning>
140+
The assistant did not use the todo list because this is an informational request with no actual tasks to complete. The user is simply asking for advice and explanation, not for the assistant to perform multiple steps or activities.
141+
</reasoning>
142+
</example>
143+
144+
<example>
145+
User: Can you add a comment to the calculateTotal function to explain what it does?
146+
Assistant: Sure, let me add a comment to the calculateTotal function to explain what it does.
147+
* Uses the Edit tool to add a comment to the calculateTotal function *
148+
149+
<reasoning>
150+
The assistant did not use the todo list because this is a single, straightforward task confined to one location. Adding a comment doesn't require tracking multiple steps or systematic organization.
151+
</reasoning>
152+
</example>
153+
154+
<example>
155+
User: What time is it in Tokyo right now?
156+
Assistant: I'll check the current time in Tokyo for you.
157+
158+
*Searches for current time in Tokyo*
159+
160+
The current time in Tokyo, Japan is [current time]. Tokyo is in the Japan Standard Time (JST) zone, which is UTC+9.
161+
162+
<reasoning>
163+
The assistant did not use the todo list because this is a single information lookup with immediate results. There are no multiple steps to track or organize, making the todo list unnecessary for this straightforward request.
164+
</reasoning>
165+
</example>
166+
167+
<example>
168+
User: I need to write a function that checks if a number is prime and then test it out.
169+
Assistant: I'll help you write a function that checks if a number is prime and then test it out.
170+
*Writes function that checks if a number is prime*
171+
*Tests the function*
172+
173+
<reasoning>
174+
Even though this is a multi-step task, it is very straightforward and can be completed in two trivial steps (which is less than 3 steps!). Using the todo list here is overkill and wastes time and tokens.
175+
</reasoning>
176+
</example>
177+
178+
<example>
179+
User: I want you to order a pizza from Dominos, order a burger from McDonald's, and order a salad from Subway.
180+
Assistant: I'll help you order a pizza from Dominos, order a burger from McDonald's, and order a salad from Subway.
181+
*Orders a pizza from Dominos*
182+
*Orders a burger from McDonald's*
183+
*Orders a salad from Subway*
184+
185+
<reasoning>
186+
Even though this is a multi-step task, assuming the assistant has the ability to order from these restaurants, it is very straightforward and can be completed in three trivial tool calls.
187+
Using the todo list here is overkill and wastes time and tokens. These three tool calls should be made in parallel, in fact.
188+
</reasoning>
189+
</example>
190+
191+
192+
## Task States and Management
193+
194+
1. **Task States**: Use these states to track progress:
195+
- pending: Task not yet started
196+
- in_progress: Currently working on (you can have multiple tasks in_progress at a time if they are not related to each other and can be run in parallel)
197+
- completed: Task finished successfully
198+
199+
2. **Task Management**:
200+
- Update task status in real-time as you work
201+
- Mark tasks complete IMMEDIATELY after finishing (don't batch completions)
202+
- Complete current tasks before starting new ones
203+
- Remove tasks that are no longer relevant from the list entirely
204+
- IMPORTANT: When you write this todo list, you should mark your first task (or tasks) as in_progress immediately!.
205+
- IMPORTANT: Unless all tasks are completed, you should always have at least one task in_progress to show the user that you are working on something.
206+
207+
3. **Task Completion Requirements**:
208+
- ONLY mark a task as completed when you have FULLY accomplished it
209+
- If you encounter errors, blockers, or cannot finish, keep the task as in_progress
210+
- When blocked, create a new task describing what needs to be resolved
211+
- Never mark a task as completed if:
212+
- There are unresolved issues or errors
213+
- Work is partial or incomplete
214+
- You encountered blockers that prevent completion
215+
- You couldn't find necessary resources or dependencies
216+
- Quality standards haven't been met
217+
218+
4. **Task Breakdown**:
219+
- Create specific, actionable items
220+
- Break complex tasks into smaller, manageable steps
221+
- Use clear, descriptive task names
222+
223+
Being proactive with task management demonstrates attentiveness and ensures you complete all requirements successfully
224+
Remember: If you only need to make a few tool calls to complete a task, and it is clear what you need to do, it is better to just do the task directly and NOT call this tool at all.`;
225+
226+
const systemPrompt = `## \`write_todos\`
227+
228+
You have access to the \`write_todos\` tool to help you manage and plan complex objectives.
229+
Use this tool for complex objectives to ensure that you are tracking each necessary step and giving the user visibility into your progress.
230+
This tool is very helpful for planning complex objectives, and for breaking down these larger complex objectives into smaller steps.
231+
232+
It is critical that you mark todos as completed as soon as you are done with a step. Do not batch up multiple steps before marking them as completed.
233+
For simple objectives that only require a few steps, it is better to just complete the objective directly and NOT use this tool.
234+
Writing todos takes time and tokens, use it when it is helpful for managing complex many-step problems! But not for simple few-step requests.
235+
236+
## Important To-Do List Usage Notes to Remember
237+
- The \`write_todos\` tool should never be called multiple times in parallel.
238+
- Don't be afraid to revise the To-Do list as you go. New information may reveal new tasks that need to be done, or old tasks that are irrelevant.`;
239+
240+
const TodoStatus = z
241+
.enum(["pending", "in_progress", "completed"])
242+
.describe("Status of the todo");
243+
const TodoSchema = z.object({
244+
content: z.string().describe("Content of the todo item"),
245+
status: TodoStatus,
246+
});
247+
const stateSchema = z.object({
248+
todos: z.array(TodoSchema).default([]),
249+
});
250+
export type TodoMiddlewareState = z.infer<typeof stateSchema>;
251+
252+
/**
253+
* Write todos tool - manages todo list with Command return
254+
* Uses getCurrentTaskInput() instead of Python's InjectedState
255+
*/
256+
const writeTodos = tool(
257+
({ todos }, config) => {
258+
return new Command({
259+
update: {
260+
todos,
261+
messages: [
262+
new ToolMessage({
263+
content: `Updated todo list to ${JSON.stringify(todos)}`,
264+
tool_call_id: config.toolCall?.id as string,
265+
}),
266+
],
267+
},
268+
});
269+
},
270+
{
271+
name: "write_todos",
272+
description: WRITE_TODOS_DESCRIPTION,
273+
schema: z.object({
274+
todos: z.array(TodoSchema).describe("List of todo items to update"),
275+
}),
276+
}
277+
);
278+
279+
/**
280+
* Creates a middleware that provides todo list management capabilities to agents.
281+
*
282+
* This middleware adds a `write_todos` tool that allows agents to create and manage
283+
* structured task lists for complex multi-step operations. It's designed to help
284+
* agents track progress, organize complex tasks, and provide users with visibility
285+
* into task completion status.
286+
*
287+
* The middleware automatically injects system prompts that guide the agent on when
288+
* and how to use the todo functionality effectively.
289+
*
290+
* @example
291+
* ```typescript
292+
* import { todoMiddleware } from './middleware/todo.js';
293+
* import { createAgent } from '../index.js';
294+
*
295+
* const agent = createAgent({
296+
* model: chatModel,
297+
* middleware: [todoMiddleware()],
298+
* });
299+
*
300+
* // Agent now has access to write_todos tool and todo state tracking
301+
* const result = await agent.invoke({
302+
* messages: [new HumanMessage("Help me refactor my codebase")]
303+
* });
304+
*
305+
* console.log(result.todos); // Array of todo items with status tracking
306+
* ```
307+
*
308+
* @returns A configured middleware instance that provides todo management capabilities
309+
*
310+
* @see {@link TodoMiddlewareState} for the state schema
311+
* @see {@link writeTodos} for the tool implementation
312+
*/
313+
export function todoMiddleware() {
314+
return createMiddleware({
315+
name: "todoMiddleware",
316+
stateSchema,
317+
tools: [writeTodos],
318+
modifyModelRequest: (request) => ({
319+
...request,
320+
systemPrompt: (request.systemPrompt ?? "") + systemPrompt,
321+
}),
322+
});
323+
}

0 commit comments

Comments
 (0)