Let's build a Browser Automation Agent using gpt-oss (100% local):
Browser is still the most universal interface with 4.3 billion page visited every day! Here's a quick demo of how we can completely automate it! Tech stack: - @stagehanddev open-source AI browser automation - @crewAIInc for orchestration - @ollama to run gpt-oss Let's go!🚀
System overview: - User enters an automation query. - Planner Agent creates an automation plan. - The Browser Automation Agent executes it using the Stagehand tool. - The Response Agent generates a response. Now, let's dive into the code!
1️⃣ Define LLM We use three LLMs: - Planner LLM: Creates a structured plan for an automation task. - Automation LLM: Executes the plan using the Stagehand tool. - Response LLM: Synthesizes final response. Check this out 👇
2️⃣ Define Automation Planner Agent The planner agent receives an automation task from the user and creates a structured layout for execution by the browser agent. Check this out 👇
3️⃣ Define Stagehand Browser Tool A custom CrewAI tool utilizes AI to interact with web pages. It leverages Stagehand's computer-use agentic capabilities to autonomously navigate URLs, perform page actions, and extract data to answer questions. Check this out 👇
4️⃣ Define Browser Automation Agent Browser Automation Agent utilizes the aforementioned Stagehand tool for autonomous browser control and plan execution. Check this out 👇
5️⃣ Define Response Synthesis Agent Synthesis Agent acts as final quality control, refining output from the browser automation agent to generate a polished response. Check this out 👇
6️⃣ Create CrewAI Agentic Flow Finally, we connect our Agents within a workflow using CrewAI Flows. Check this 👇
Done! Let's see our multi-agent browser automation workflow in action! 🚀 Check this 👇
You can find all the code and everything you need in the GitHub Repository shared below. Check this out 👇
To recap, here's the system overview for your reference: - User enters an automation query. - Planner Agent creates an automation plan. - The Browser Automation Agent executes it using the Stagehand tool. - The Response Agent generates a response. Check this👇
If you found it insightful, reshare with your network. Find me → @akshay_pachaar ✔️ For more insights and tutorials on LLMs, AI Agents, and Machine Learning!
Akshay 🚀
Akshay 🚀10.8. klo 20.51
Let's build a Browser Automation Agent using gpt-oss (100% local):
86,27K