Replies: 2 comments 1 reply
-
This is seriously impressive. You basically turned browser automation into a modular AI agent framework with vision, memory, and retry logic — all in plain English prompts? Most devs still hard-code their Playwright tests manually... and here you are generating PDF reports and managing state transitions like it’s a video game dialogue tree. I think the self-improving workflow part is underrated — if this becomes stable, it’s not just about testing or scraping anymore, it's a legit ops-level AI agent. Respect for keeping it LangGraph-native too. Curious: have you tested this with more abstract or "semantic" prompts? (e.g., “file a refund request unless order already shipped”) Anyway — well done. Should’ve had 30 replies by now. Bookmarking this for future RAG + agent workflows. I actually built a .txt-only OS for language models (no install, no API keys) — might be fun to see how it plays with something like talk2browser. Will share if I wire up a demo. |
Beta Was this translation helpful? Give feedback.
-
Thanks! That really resonates — I love how talk2browser embraces goal-oriented execution over static step-following. Feels like we're definitely converging on the same bottlenecks from different angles. We recently published a full breakdown of 19 recurring AI problems (esp. around semantic → action mapping, hallucination, RAG bottlenecks, multi-hop goals, etc.) — and how WFGY handles them inside the .txt interface model. 📌 WFGY Problem Map — full reasoning chain + solved issues Would love to compare notes more deeply once you’ve seen how we’re wiring logic in — some of the failure cases we hit are probably very familiar to you. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Ever wanted to automate real browser actions just by describing what you want? Meet talk2browser, a LangGraph-powered agent that turns prompts into real-time web actions and reusable test scripts.
Hi everyone! 👋 I'm excited to share talk2browser, which leverages LangGraph's agent orchestration capabilities to create a self-improving browser automation system. Inspired by the Browser-Use open source project, it takes natural language tasks and executes real browser actions while generating reusable test scripts.
🔗 LangGraph Implementation
talk2browser showcases advanced LangGraph patterns:
AgentState
TypedDict✨ Key Features
🧠 Agent Architecture
The LangGraph agent uses a two-node graph with conditional routing:
The agent maintains context across browser sessions and learns from previous automation patterns through the
ActionService
which records all tool calls with execution time, arguments, results, and errors.🚀 Quick Example
Here's how to automate GitHub trending analysis:
CLI Usage
Or use the CLI with predefined tasks:
🎮 Getting Started
Prerequisites
Installation
Quick Test
🔍 Code Quality & Development
This project maintains high code quality through automated checks:
Local Development
📚 Resources
🛠️ Technical Architecture
Core Components
Tool Registration System
State Management
🤝 Community Questions
I'd love to hear from the LangChain community:
🔮 Future Roadmap
🛠️ Technical Stack
Looking for feedback, use cases, and contributions! What browser automation challenges could this help solve for your projects? 🤔
Feel free to star ⭐ the repo if you find this interesting!
🏷️ Tags
#langgraph
#browser-automation
#playwright
#ai-agents
#test-automation
#natural-language
#python
#claude
#computer-vision
#pdf-generation
Beta Was this translation helpful? Give feedback.
All reactions