browser-use/browser-use
Browser Use
Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows without relying on brittle selectors. The system functions as a headless browser controller, providing a programmatic interface to manage browser instances and execute granular interactions.
The project distinguishes itself through its ability to translate high-level intent into specific browser primitives, supported by a serialization process that converts complex web page structures into simplified text for model processing. It includes robust support for stateful session persistence, allowing agents to maintain authenticated environments across long-running tasks. Furthermore, the framework facilitates remote browser orchestration, enabling the scaling of automation routines in cloud environments with integrated support for stealth configurations and proxy management.
Beyond its core agent capabilities, the platform provides extensive tooling for structured data extraction and workflow integration. It supports a variety of model configurations and allows for the definition of custom tools to extend interaction logic. The project documentation includes quickstart guides for command-line execution and examples for integrating browser automation into broader software ecosystems.
Features
- LLM-Driven Agent Loops - Orchestrates multi-step task execution by iteratively processing visual page context and generating actionable commands through a language model.
- Autonomous Browser Agents - A framework for deploying intelligent agents that interpret natural language goals to navigate, interact with, and extract data from web interfaces.
- Autonomous Web Agents - Deploy autonomous agents to perform multi-step web tasks by defining high-level goals and managing browser sessions through integrated language models.
- Structured Data Extraction - Converting unstructured web content into clean, typed data formats by automating navigation and interaction across dynamic, modern web applications.
- Structured Web Data Extractors - A specialized engine that transforms unstructured web content into typed, schema-compliant data formats using vision-capable language models and DOM analysis.
- Web Interaction Agents - Execute complex web tasks using natural language instructions to extract structured data, manage files, and coordinate human-in-the-loop approval workflows.
- LLM-Powered Automation Orchestrators - A control layer that bridges large language models with browser automation protocols to execute complex, multi-step workflows across web applications.
- CDP Automation Interfaces - Communicates with browser instances via the Chrome DevTools Protocol to execute low-level commands and capture real-time page state.
- Browser Interaction Primitives - Built-in Browser Actions — a named example documented in this learning resource.
- Headless Browser Controllers - A programmatic interface for managing remote browser instances, handling session persistence, and executing granular DOM interactions through standardized automation drivers.
- Session Persistence Mechanisms - Synchronizes cookies and local storage across automation cycles to maintain authenticated browser environments for long-running, multi-step workflows.
- Action-Tool Abstractions - Maps high-level natural language intents to specific browser primitives, allowing for modular extension and custom interaction logic definition.
- Remote Browser Infrastructure Management - Deploying and scaling headless browser instances in cloud environments with support for stealth, proxies, and remote debugging capabilities.
- Browser Environment Configurations - Configure browser instances with stealth settings, residential proxies, and live streaming capabilities to support standard automation protocols and remote debugging.
- DOM Serialization Tools - Converts complex web page structures into simplified text representations to provide language models with actionable navigation and interaction targets.
- Remote Browser Orchestration - Connects to distributed cloud-based browser infrastructure to scale automation tasks while managing proxy and stealth configurations externally.
- Structured Data Retrievers - Structured Data Retrieval — a named example documented in this learning resource.
- Browser Session Persistence - Persist browser state across sessions by synchronizing cookies and local storage to maintain continuous user identity for automated web tasks.
- Generative Model Configurations - Gemini Model Configuration — a named example documented in this learning resource.
- Browser-Based Workflow Automations - Connecting web-based software to external systems and APIs to synchronize data and automate repetitive cross-platform business processes.
- Custom Tool Definitions - Custom Tool Definition — a named example documented in this learning resource.
- Browser Automation Orchestrators - Trigger browser automation routines programmatically through RESTful endpoints to handle authentication and task execution in remote environments.
- Typed Data Extraction - Typed Data Extraction — a named example documented in this learning resource.
- Workflow Integration Hooks - Connect browser automation to external systems using standardized protocols and webhooks to synchronize data across disparate platforms.
- CLI Browser Automation Tools - Execute navigation and interaction commands directly from the terminal to capture page state and accelerate the development of automation scripts.