---
name: owl-browser
description: Automates browsers using Owl Browser, a high-performance browser engine with 163 tools for web navigation, element interaction, text extraction, screenshots, fingerprint management, CAPTCHA solving, proxy management, video recording, and cookie management. Use when automating web browsing, extracting data, testing web applications, filling forms, or controlling browser instances via Python SDK, Node.js SDK, or REST API.
license: Proprietary
compatibility: Requires a running Owl Browser instance (Docker or native). Network access needed to connect to the Owl Browser HTTP server.
metadata:
  author: olib-ai
  version: "1.1.2"
  website: https://www.owlbrowser.net
---

# Owl Browser

Complete reference for AI agents to automate browsers using Owl Browser.

## Contents

- [Core Concepts](#core-concepts)
- [Environment Setup](#environment-setup)
- [Python SDK](#python-sdk)
- [Node.js SDK](#nodejs-sdk)
- [REST API (Direct)](#rest-api-direct)
- [Flow JSON Format](#flow-json-format)
- [Tools Reference](#tools-reference)

---

## Core Concepts

- **Context**: An isolated browser instance with its own fingerprint, cookies,
  and proxy. Create with `browser_create_context`, close with `browser_close_context`.
  Every tool call (except context creation/listing) requires a `context_id`.
- **Tool**: An atomic browser action (navigate, click, type, screenshot, etc.).
  Execute via SDK methods or REST `POST /execute/{tool_name}`.
- **Flow**: A JSON sequence of tool steps executed in order with variable
  resolution and expectations. Portable across SDKs.
- **Antidetect**: Each context gets a unique, realistic browser fingerprint
  (canvas, WebGL, audio, fonts, navigator, etc.) from a database of 100+
  real device profiles.
- **Smart Selectors**: Any tool that takes a `selector` parameter accepts
  three formats — the browser auto-detects which one you're using:
  - **CSS selector**: `#submit-btn`, `.nav-link`, `input[name=email]`
  - **Coordinates**: `100x200` (clicks at x=100, y=200)
  - **Natural language**: `login button`, `email input`, `search icon`

  Natural language selectors use a built-in semantic matcher that scores
  page elements by text similarity, ARIA labels, placeholders, and visual
  context — no CSS inspection required. Prefer natural language when you
  don't know the exact selector; use CSS when you need precision.

---

## Environment Setup

Store your connection settings in a `.env` file (never commit this file):

```bash
# .env
OWL_ENDPOINT=https://your-domain.com
OWL_TOKEN=your-secret-token
```

All SDK and REST API examples below read from these variables.
Make sure `.env` is in your `.gitignore`.

---

## Python SDK

### Installation

```bash
pip install owl-browser python-dotenv
```

### Connection

```python
import os
from dotenv import load_dotenv
from owl_browser import OwlBrowser, RemoteConfig

load_dotenv()  # Loads .env file

config = RemoteConfig(
    url=os.environ["OWL_ENDPOINT"],
    token=os.environ["OWL_TOKEN"]
)

# For development (direct to http-server on port 8080):
# Set OWL_ENDPOINT=http://localhost:8080 in .env and add:
#   api_prefix=""  # Empty string for direct connection
```

### Async Usage (Recommended)

```python
import os, asyncio
from dotenv import load_dotenv
from owl_browser import OwlBrowser, RemoteConfig

load_dotenv()

async def main():
    config = RemoteConfig(
        url=os.environ["OWL_ENDPOINT"],
        token=os.environ["OWL_TOKEN"]
    )

    async with OwlBrowser(config) as browser:
        # Create a browser context
        ctx = await browser.create_context()
        context_id = ctx["context_id"]

        # Navigate to a page
        await browser.navigate(context_id=context_id, url="https://example.com")

        # Wait for page to fully load
        await browser.wait_for_network_idle(context_id=context_id)

        # Click an element (CSS selector)
        await browser.click(context_id=context_id, selector="button#submit")

        # Click using natural language (semantic matcher)
        await browser.click(context_id=context_id, selector="submit button")

        # Type into an input (natural language — no need to inspect the DOM)
        await browser.type(context_id=context_id, selector="email input", text="user@example.com")

        # Take a screenshot (returns base64)
        screenshot = await browser.screenshot(context_id=context_id)

        # Extract text content
        text = await browser.extract_text(context_id=context_id, selector="h1")

        # Get full page as markdown
        markdown = await browser.get_markdown(context_id=context_id)

        # Close the context
        await browser.close_context(context_id=context_id)

asyncio.run(main())
```

---

## Node.js SDK

### Installation

```bash
npm install @olib-ai/owl-browser-sdk dotenv
```

### Quick Start

```typescript
import 'dotenv/config';
import { OwlBrowser } from '@olib-ai/owl-browser-sdk';

const browser = new OwlBrowser({
  url: process.env.OWL_ENDPOINT!,
  token: process.env.OWL_TOKEN!,
  apiPrefix: ''  // '' for direct, '/api' for nginx proxy
});

await browser.connect();

// Create a context
const ctx = await browser.createContext();
const contextId = ctx.context_id;

// Navigate
await browser.navigate({ context_id: contextId, url: 'https://example.com' });

// Wait for page load
await browser.waitForNetworkIdle({ context_id: contextId });

// Interact (CSS selectors)
await browser.click({ context_id: contextId, selector: 'button#submit' });
await browser.type({ context_id: contextId, selector: '#email', text: 'user@example.com' });

// Or use natural language — the semantic matcher finds the right element
await browser.click({ context_id: contextId, selector: 'submit button' });
await browser.type({ context_id: contextId, selector: 'email input', text: 'user@example.com' });

// Extract content
const screenshot = await browser.screenshot({ context_id: contextId });
const markdown = await browser.getMarkdown({ context_id: contextId });

// Close
await browser.closeContext({ context_id: contextId });
await browser.close();
```

---

## REST API (Direct)

Use the REST API directly when not using an SDK.

### Authentication

All API calls (except `/health`) require a Bearer token:

```
Authorization: Bearer <your-token>
```

Load credentials from your `.env` file:

```bash
# Load .env variables into current shell
export $(grep -v '^#' .env | xargs)
```

### Endpoints

| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Health check (no auth) |
| GET | `/api/tools` | List all tools |
| POST | `/api/execute/{tool_name}` | Execute a tool |
| GET | `/api/schema` | Full tool schema |

### Example: Execute a Tool

```bash
# Create context
curl -X POST -H "Authorization: Bearer $OWL_TOKEN" \
  -H "Content-Type: application/json" \
  $OWL_ENDPOINT/execute/browser_create_context

# Navigate
curl -X POST -H "Authorization: Bearer $OWL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"context_id": "ctx_000001", "url": "https://example.com"}' \
  $OWL_ENDPOINT/execute/browser_navigate

# Screenshot
curl -X POST -H "Authorization: Bearer $OWL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"context_id": "ctx_000001"}' \
  $OWL_ENDPOINT/execute/browser_screenshot
```

---

## Flow JSON Format

Flows are portable JSON files that define a sequence of browser automation steps.
They work with both the Python and Node.js SDKs, and can be created in the
Owl Browser control panel UI.

### Structure

```json
{
  "name": "My Automation Flow",
  "description": "Description of what this flow does",
  "steps": [
    {
      "type": "browser_navigate",
      "url": "https://example.com"
    },
    {
      "type": "browser_wait_for_network_idle"
    },
    {
      "type": "browser_type",
      "selector": "#email",
      "text": "user@example.com"
    },
    {
      "type": "browser_click",
      "selector": "#submit"
    },
    {
      "type": "browser_extract_text",
      "selector": ".result",
      "expected": {
        "contains": "Success"
      }
    }
  ]
}
```

### Step Fields

- `type` (required): Tool name (e.g., `browser_navigate`, `browser_click`)
- `description` (optional): Human-readable step description
- `selected` / `enabled` (optional): Set to `false` to skip step
- `expected` (optional): Validate the step result
- All other fields are the tool's parameters (without `context_id`, which is
  injected automatically by the executor)

### Variable Resolution

Use `${prev}` to reference the previous step's result:

```json
{
  "steps": [
    {
      "type": "browser_get_page_info"
    },
    {
      "type": "browser_navigate",
      "url": "${prev.url}/about"
    }
  ]
}
```

Supported syntax:
- `${prev}` - Entire previous result
- `${prev.field}` - Field in previous result
- `${prev[0]}` - Array element
- `${prev[0].id}` - Nested access

### Expectations

Validate step results:

```json
{
  "type": "browser_extract_text",
  "selector": "#count",
  "expected": {
    "greaterThan": 0,
    "field": "length"
  }
}
```

Supported expectations:
- `equals` - Exact match
- `contains` - String contains
- `length` - Array/string length
- `greaterThan` - Numeric comparison
- `lessThan` - Numeric comparison
- `notEmpty` - Not null/undefined/empty
- `matches` - Regex pattern match
- `field` - Nested field path to check (e.g., `"data.count"`)

### Conditional Branching (Node.js SDK)

```json
{
  "type": "condition",
  "condition": {
    "source": "previous",
    "operator": "equals",
    "field": "success",
    "value": true
  },
  "onTrue": [
    { "type": "browser_click", "selector": "#continue" }
  ],
  "onFalse": [
    { "type": "browser_screenshot" }
  ]
}
```

Full flow authoring guide (variables, capture, extract with JSONPath/regex, for_each, retry/timeout, notifications, recipes): https://www.owlbrowser.net/agent-flows.md

---

## Tools Reference

All browser tools require `context_id` (string) as first param — omitted below for brevity.
Exceptions: `browser_create_context`, `browser_list_contexts`, and all `http_*` tools.

Tool name mapping:
- REST API / Flow JSON: use as-is (e.g., `browser_navigate`)
- Python SDK: strip `browser_` prefix → `navigate(context_id=cid, url=...)`
- Node.js SDK: strip `browser_` prefix, camelCase or snake_case → `browser.navigate({context_id, url})`

```yaml
browser_create_context:
  description: "Create a new isolated browser context with its own cookies, storage, a"
  params:
    - profile_path (string): "Path to a browser profile JSON file (or upload file via multipart/form"
    - os (enum): "Filter profiles by operating system. [windows, macos, linux]"
    - gpu (string): "Filter profiles by GPU vendor/model."
    - screen_size (enum): "Screen resolution for the browser context. [1920x1080, 2560x1440, 3440x1440, 3840x2160, 3840x1600, 5120x2160, 5120x1440]"
    - timezone (string): "Override browser timezone."
    - resource_blocking (boolean): "Enable or disable resource blocking (ads, trackers, analytics)."
    - chrome_version_id (integer): "Pin this context to a specific row in the chrome_versions table (FK by"
    - chrome_version_major (enum): "Pin this context to a specific Chrome major. [143, 144, 145, 146, 147, 148]"
    - proxy_type (enum): "Type of proxy server to use. [http, https, socks4, socks5, socks5h, gae]"
    - proxy_host (string): "Proxy server hostname or IP address (e.g., '127.0.0.1' or 'proxy.examp"
    - proxy_port (integer): "Proxy server port number (e.g., 8080 for HTTP proxy, 9050 for Tor)"
    - proxy_username (string): "Username for proxy authentication."
    - proxy_password (string): "Password for proxy authentication."
    - proxy_stealth (boolean): "Enable stealth mode to prevent proxy/VPN detection."
    - proxy_ca_cert_path (string): "Path to custom CA certificate file (.pem, .crt, .cer) for SSL intercep"
    - proxy_ca_key_path (string): "Path to CA private key file for GAE/private app proxy MITM."
    - proxy_trust_custom_ca (boolean): "Trust the custom CA certificate for SSL interception."
    - is_tor (boolean): "Explicitly mark this proxy as a Tor connection."
    - tor_control_port (integer): "Tor control port for circuit isolation."
    - tor_control_password (string): "Password for Tor control port authentication."
    - llm_enabled (boolean): "Enable or disable LLM features for this context."
    - llm_use_builtin (boolean): "Use the built-in llama-server for LLM inference."
    - llm_endpoint (string): "External LLM API endpoint URL (e.g., 'https://api.openai.com/v1' for O"
    - llm_model (string): "External LLM model name (e.g., 'gpt-4-vision-preview' for OpenAI)."
    - llm_api_key (string): "API key for the external LLM provider."
browser_go:
  description: "One-shot browser navigation tool."
  params:
    - url (string, required): "The full URL to navigate to, including protocol (e.g., 'https://exampl"
    - wait_until (enum): "When to consider navigation complete: '' (return immediately), 'load'  [load, domcontentloaded, networkidle, fullscroll]"
    - timeout (integer): "Maximum time to wait for navigation in milliseconds."
    - output (enum): "Output format. [html, text, markdown, pngView, pngFull]"
    - os (enum): "Filter profiles by operating system. [windows, macos, linux]"
    - use_tor (boolean): "Use TOR proxy for anonymous browsing."
browser_search:
  description: "One-shot web search tool."
  params:
    - query (string, required): "The search query string (e.g., 'best restaurants in NYC', 'python tuto"
    - provider (enum): "Search engine provider. [duckduckgo, google]"
    - page (integer): "Page number for paginated results (1-based)."
    - os (enum): "Filter profiles by operating system. [windows, macos, linux]"
    - use_tor (boolean): "Use TOR proxy for anonymous browsing."
browser_close_context:
  description: "Close a browser context and release all associated resources including"
browser_list_contexts:
  description: "List all currently active browser contexts with their IDs, creation ti"
browser_navigate:
  description: "Navigate the browser to a specified URL."
  params:
    - url (string, required): "The full URL to navigate to, including protocol (e.g., 'https://exampl"
    - wait_until (enum): "When to consider navigation complete: '' (return immediately, default) [load, domcontentloaded, networkidle, fullscroll]"
    - timeout (integer): "Maximum time to wait for navigation in milliseconds."
browser_reload:
  description: "Reload the current page."
  params:
    - ignore_cache (boolean): "When true, performs a hard reload that bypasses the browser cache and "
    - wait_until (enum): "When to consider reload complete: '' (return immediately), 'load' (wai [load, domcontentloaded, networkidle, fullscroll]"
    - timeout (integer): "Maximum time to wait for reload in milliseconds."
browser_go_back:
  description: "Navigate back to the previous page in the browser's history stack."
  params:
    - wait_until (enum): "When to consider navigation complete: '' (return immediately), 'load'  [load, domcontentloaded, networkidle, fullscroll]"
    - timeout (integer): "Maximum time to wait for navigation in milliseconds."
browser_go_forward:
  description: "Navigate forward to the next page in the browser's history stack."
  params:
    - wait_until (enum): "When to consider navigation complete: '' (return immediately), 'load'  [load, domcontentloaded, networkidle, fullscroll]"
    - timeout (integer): "Maximum time to wait for navigation in milliseconds."
browser_can_go_back:
  description: "Check if navigation back is possible."
browser_can_go_forward:
  description: "Check if navigation forward is possible."
browser_set_content:
  description: "Set the page's HTML content directly."
  params:
    - html (string, required): "The HTML content to set as the page body."
browser_click:
  description: "Click on an element using CSS selector, XY coordinates, or natural lan"
  params:
    - selector (string, required): "Target element to click."
    - hold_ms (integer): "Duration in milliseconds to hold the mouse button down before releasin"
    - index (integer): "When multiple elements match the selector, click the Nth element (0-ba"
browser_type:
  description: "Type text into an input field with human-like keystroke simulation."
  params:
    - selector (string): "Target input field."
    - text (string, required): "The text to type into the input field."
    - index (integer): "When multiple elements match the selector, target the Nth element (0-b"
browser_pick:
  description: "Select an option from a dropdown or select element."
  params:
    - selector (string, required): "Target dropdown/select element."
    - value (string, required): "The option to select."
browser_press_key:
  description: "Press a keyboard key."
  params:
    - key (string, required): "Key to press."
browser_submit_form:
  description: "Submit the currently focused form by simulating an Enter key press."
browser_drag_drop:
  description: "Perform a mouse drag operation from start coordinates to end coordinat"
  params:
    - start_x (number, required): "X coordinate (in pixels from left edge) where the drag starts."
    - start_y (number, required): "Y coordinate (in pixels from top edge) where the drag starts"
    - end_x (number, required): "X coordinate (in pixels from left edge) where the drag ends (drop loca"
    - end_y (number, required): "Y coordinate (in pixels from top edge) where the drag ends (drop locat"
    - mid_points (array): "Optional array of intermediate [x, y] coordinates for the drag path."
browser_html5_drag_drop:
  description: "Perform HTML5 drag and drop using proper DragEvent dispatch."
  params:
    - source_selector (string, required): "CSS selector for the draggable source element (must have draggable='tr"
    - target_selector (string, required): "CSS selector for the drop target element."
browser_mouse_move:
  description: "Move the mouse cursor along a natural curved path from start to end po"
  params:
    - start_x (number, required): "Starting X coordinate (in pixels) - typically the current mouse positi"
    - start_y (number, required): "Starting Y coordinate (in pixels) - typically the current mouse positi"
    - end_x (number, required): "Target X coordinate (in pixels) where the mouse should move to"
    - end_y (number, required): "Target Y coordinate (in pixels) where the mouse should move to"
    - steps (number): "Number of intermediate points along the path."
    - stop_points (array): "Optional array of [x, y] coordinates where the cursor pauses briefly ("
browser_hover:
  description: "Hover over an element without clicking."
  params:
    - selector (string, required): "CSS selector, position coordinates (e.g., '100x200'), or natural langu"
    - index (integer): "When multiple elements match the selector, target the Nth element (0-b"
browser_double_click:
  description: "Double-click an element."
  params:
    - selector (string, required): "CSS selector, position coordinates (e.g., '100x200'), or natural langu"
    - index (integer): "When multiple elements match the selector, target the Nth element (0-b"
browser_right_click:
  description: "Right-click (context click) on an element to open the context menu."
  params:
    - selector (string, required): "CSS selector, position coordinates (e.g., '100x200'), or natural langu"
    - index (integer): "When multiple elements match the selector, target the Nth element (0-b"
browser_clear_input:
  description: "Clear all text content from an input field."
  params:
    - selector (string, required): "CSS selector or natural language description of the input element to c"
    - index (integer): "When multiple elements match the selector, target the Nth element (0-b"
browser_focus:
  description: "Set focus to an element without clicking it."
  params:
    - selector (string, required): "CSS selector or natural language description of the element to focus"
    - index (integer): "When multiple elements match the selector, target the Nth element (0-b"
browser_blur:
  description: "Remove focus from an element."
  params:
    - selector (string, required): "CSS selector or natural language description of the element to remove "
browser_select_all:
  description: "Select all text content within an input element."
  params:
    - selector (string, required): "CSS selector or natural language description of the input element to s"
browser_keyboard_combo:
  description: "Press a keyboard combination with modifiers."
  params:
    - combo (string, required): "Key combination string using modifiers and keys."
browser_upload_file:
  description: "Upload files to a file input element."
  params:
    - selector (string, required): "CSS selector for the file input element (e.g., 'input[type=\"file\"]', '"
    - file_paths (array, required): "Array of file paths to upload."
browser_is_visible:
  description: "Check if an element is visible on the page."
  params:
    - selector (string, required): "CSS selector or natural language description of the element to check v"
    - index (integer): "When multiple elements match the selector, check the Nth element (0-ba"
browser_is_enabled:
  description: "Check if an element is enabled (not disabled)."
  params:
    - selector (string, required): "CSS selector or natural language description of the element to check e"
    - index (integer): "When multiple elements match the selector, check the Nth element (0-ba"
browser_is_checked:
  description: "Check if a checkbox or radio button is currently checked/selected."
  params:
    - selector (string, required): "CSS selector or natural language description of the checkbox or radio "
    - index (integer): "When multiple elements match the selector, check the Nth element (0-ba"
browser_is_editable:
  description: "Check if an element is editable (not disabled AND not readOnly)."
  params:
    - selector (string, required): "CSS selector or natural language description of the element to check e"
    - index (integer): "When multiple elements match the selector, check the Nth element (0-ba"
browser_count_elements:
  description: "Count the number of elements matching a CSS selector."
  params:
    - selector (string, required): "CSS selector to count matching elements (e.g., 'li.item', 'input[type="
browser_dispatch_event:
  description: "Dispatch a DOM event on an element."
  params:
    - selector (string, required): "CSS selector or natural language description of the element to dispatc"
    - event_type (string, required): "The DOM event type to dispatch (e.g., 'click', 'input', 'change', 'foc"
    - bubbles (boolean): "Whether the event should bubble up through the DOM tree."
browser_get_attribute:
  description: "Get the value of an HTML attribute from an element."
  params:
    - selector (string, required): "CSS selector or natural language description of the element"
    - attribute (string, required): "The attribute name to retrieve (e.g., 'href', 'src', 'value', 'data-id"
    - index (integer): "When multiple elements match the selector, get attribute from the Nth "
browser_get_bounding_box:
  description: "Get the position and size of an element."
  params:
    - selector (string, required): "CSS selector or natural language description of the element to get pos"
    - index (integer): "When multiple elements match the selector, get bounding box of the Nth"
browser_evaluate:
  description: "Execute arbitrary JavaScript code in the page context."
  params:
    - script (string): "JavaScript code to execute in the page context."
    - expression (string): "JavaScript expression to evaluate and return (shorthand for script wit"
    - return_value (boolean): "If true, treats the script as an expression and returns its value."
browser_clipboard_read:
  description: "Read text content from the system clipboard."
browser_clipboard_write:
  description: "Write text content to the system clipboard."
  params:
    - text (string, required): "Text content to write to the system clipboard"
browser_clipboard_clear:
  description: "Clear the system clipboard content."
browser_list_frames:
  description: "List all frames (main frame and iframes) on the current page."
browser_switch_to_frame:
  description: "Switch the browser context to interact with an iframe."
  params:
    - frame_selector (string, required): "Frame identifier: name attribute, frame index as string (e.g., '0', '1"
browser_switch_to_main_frame:
  description: "Switch back to the main (top-level) frame after interacting with an if"
browser_extract_text:
  description: "Extract visible text content from the page or a specific element."
  params:
    - selector (string): "Optional CSS selector or natural language description to extract text "
    - regex (string): "Optional regex pattern to apply to the extracted text."
    - regex_group (integer): "Which capture group to return from the regex match."
    - index (integer): "When multiple elements match the selector, extract text from the Nth e"
browser_screenshot:
  description: "Capture a PNG screenshot with configurable modes."
  params:
    - mode (enum): "Screenshot mode: 'viewport' (default, current visible view), 'element' [viewport, element, fullpage]"
    - selector (string): "CSS selector or natural language description for the element to captur"
    - scale (integer): "Scale percentage for the output image (1-100)."
browser_highlight:
  description: "Visually highlight an element on the page with a colored border and ba"
  params:
    - selector (string, required): "CSS selector or natural language description of the element to highlig"
    - border_color (string): "CSS color for the highlight border."
    - background_color (string): "CSS color for the highlight background (use alpha for transparency)."
    - index (integer): "When multiple elements match the selector, highlight the Nth element ("
browser_show_grid_overlay:
  description: "Display an XY coordinate grid overlay on the page with position labels"
  params:
    - horizontal_lines (number): "Number of horizontal lines to display from top to bottom."
    - vertical_lines (number): "Number of vertical lines to display from left to right."
    - line_color (string): "CSS color for grid lines (use low alpha for visibility)."
    - text_color (string): "CSS color for coordinate labels at intersections."
browser_get_html:
  description: "Extract HTML content with configurable cleaning levels."
  params:
    - selector (string): "Optional CSS selector to get innerHTML of a specific element (e.g., '#"
    - clean_level (enum): "Level of HTML cleaning to apply. [minimal, basic, aggressive]"
browser_get_markdown:
  description: "Convert the page content to clean Markdown format."
  params:
    - include_links (boolean): "Include hyperlinks in markdown output as [text](url)."
    - include_images (boolean): "Include images in markdown output as ![alt](src)."
    - max_length (integer): "Maximum length of the output in characters."
browser_extract_site:
  description: "Extract content from multiple pages of a website."
  params:
    - url (string, required): "Starting URL to begin extraction from"
    - depth (integer): "How many link levels to follow from the starting page."
    - max_pages (integer): "Maximum number of pages to extract."
    - follow_external (boolean): "Whether to follow links to external domains."
    - output_format (enum): "Output format for extracted content: 'markdown' (default), 'text', or  [markdown, text, json]"
    - include_images (boolean): "Include resolved image URLs in output."
    - include_metadata (boolean): "Include page title and description metadata."
    - exclude_patterns (array): "Array of URL patterns to skip (glob patterns)."
    - timeout_per_page (integer): "Timeout per page in milliseconds."
browser_extract_site_progress:
  description: "Get progress of a site extraction job."
  params:
    - job_id (string, required): "The job ID returned from browser_extract_site"
browser_extract_site_result:
  description: "Get the result of a completed site extraction job."
  params:
    - job_id (string, required): "The job ID returned from browser_extract_site"
browser_extract_site_cancel:
  description: "Cancel a running site extraction job."
  params:
    - job_id (string, required): "The job ID to cancel"
browser_extract_json:
  description: "Extract structured data from the page as JSON."
  params:
    - template (string): "Extraction template name for known site types."
    - selector (string): "CSS selector to scope extraction to a specific container (e.g., 'div.p"
browser_detect_site:
  description: "Identify the type of website currently loaded (e.g., 'google_search', "
browser_list_templates:
  description: "List all available JSON extraction templates with their names and desc"
browser_summarize_page:
  description: "Generate an intelligent, structured summary of the current page using "
  params:
    - force_refresh (boolean): "Force regeneration of the page summary even if cached."
browser_query_page:
  description: "Ask a natural language question about the current page content."
  params:
    - query (string, required): "Natural language question to ask about the page content."
browser_llm_status:
  description: "Check if the LLM is ready to use."
browser_nla:
  description: "Execute complex browser automation using natural language commands."
  params:
    - command (string, required): "Natural language instruction for browser automation."
browser_ai_click:
  description: "Click an element described in natural language using AI vision."
  params:
    - description (string, required): "Natural language description of the element to click (e.g., 'search bu"
browser_ai_type:
  description: "Type text into an element described in natural language using AI visio"
  params:
    - description (string, required): "Natural language description of the input element (e.g., 'search box',"
    - text (string, required): "Text to type into the input element"
browser_ai_extract:
  description: "Extract specific information from the page using AI."
  params:
    - what (string, required): "Description of what to extract (e.g., 'all product prices', 'main head"
browser_ai_query:
  description: "Ask a natural language question about the current page."
  params:
    - query (string, required): "Natural language question about the page content"
browser_ai_analyze:
  description: "Perform comprehensive AI analysis of the current page."
browser_find_element:
  description: "Find an element on the page using natural language description."
  params:
    - description (string, required): "Natural language description of elements to find (e.g., 'all buttons',"
    - max_results (integer): "Maximum number of results to return (default: 10)"
browser_scroll_by:
  description: "Scroll the page by a specified number of pixels vertically and/or hori"
  params:
    - y (integer, required): "Vertical scroll amount in pixels."
    - x (integer): "Horizontal scroll amount in pixels."
    - verification_level (string): "Verification level for scroll action."
browser_scroll_to_element:
  description: "Scroll the page to bring a specific element into view."
  params:
    - selector (string, required): "CSS selector or natural language description of the element to scroll "
    - index (integer): "When multiple elements match the selector, scroll to the Nth element ("
browser_scroll_to_top:
  description: "Instantly scroll to the very top of the page."
browser_scroll_to_bottom:
  description: "Instantly scroll to the very bottom of the page."
browser_wait_for_selector:
  description: "Wait for an element matching the selector to appear and become visible"
  params:
    - selector (string, required): "CSS selector or natural language description of the element to wait fo"
    - timeout (integer): "Maximum time to wait in milliseconds."
    - index (integer): "When multiple elements match the selector, wait for the Nth element (0"
browser_wait:
  description: "Pause execution for a fixed number of milliseconds."
  params:
    - timeout (integer, required): "Time to wait in milliseconds before continuing."
browser_wait_for_network_idle:
  description: "Wait until there are no pending network requests for a specified durat"
  params:
    - idle_time (integer): "Duration of network inactivity (no pending requests) required to consi"
    - timeout (integer): "Maximum time to wait for network idle state."
browser_wait_for_function:
  description: "Wait for a custom JavaScript condition to become true."
  params:
    - js_function (string, required): "JavaScript code that returns truthy value when condition is met."
    - polling (integer): "Interval between function evaluations in milliseconds."
    - timeout (integer): "Maximum time to wait for function to return true."
browser_wait_for_url:
  description: "Wait for the page URL to match a specific pattern."
  params:
    - url_pattern (string, required): "URL pattern to match."
    - is_regex (boolean): "Enable glob-style pattern matching with * and ? wildcards."
    - timeout (integer): "Maximum time to wait for URL to match."
browser_get_page_info:
  description: "Get comprehensive information about the current page including URL, ti"
browser_set_viewport:
  description: "Set the browser viewport size for responsive testing."
  params:
    - width (integer, required): "Viewport width in pixels."
    - height (integer, required): "Viewport height in pixels."
browser_reset_viewport:
  description: "Reset the browser viewport to its original default size (from the VM p"
browser_zoom_in:
  description: "Zoom in the page content by 10%."
browser_zoom_out:
  description: "Zoom out the page content by 10%."
browser_zoom_reset:
  description: "Reset page zoom to 100% (default)."
browser_get_console_log:
  description: "Read console logs from the browser."
  params:
    - level (string): "Filter by log level: 'debug', 'info', 'warn', 'error', 'verbose'."
    - filter (string): "Filter logs containing specific text (case-sensitive substring match)"
    - limit (integer): "Maximum number of log entries to return."
browser_clear_console_log:
  description: "Clear all console logs from the browser context."
browser_start_video_recording:
  description: "Begin recording a video of the browser session."
  params:
    - fps (integer): "Frames per second for the recording."
    - codec (string): "Video codec for encoding."
browser_pause_video_recording:
  description: "Temporarily pause video recording without stopping it."
browser_resume_video_recording:
  description: "Resume a paused video recording."
browser_stop_video_recording:
  description: "Stop video recording and save the video file."
browser_get_video_recording_stats:
  description: "Get statistics about the current video recording including frames capt"
browser_download_video_recording:
  description: "Get a download URL for a completed video recording."
browser_start_live_stream:
  description: "Start a live MJPEG video stream of the browser session accessible via "
  params:
    - fps (integer): "Target frames per second for the stream."
    - quality (integer): "JPEG compression quality from 1 (lowest/smallest) to 100 (highest/larg"
browser_stop_live_stream:
  description: "Stop an active live video stream."
browser_get_live_stream_stats:
  description: "Get statistics about an active live stream including connected viewers"
browser_list_live_streams:
  description: "List all currently active live video streams across all contexts with "
browser_get_live_frame:
  description: "Get the latest frame from a live video stream as a base64-encoded imag"
browser_get_demographics:
  description: "Get comprehensive user demographics and context based on IP geolocatio"
browser_get_location:
  description: "Get user's current geographic location based on IP address."
browser_get_datetime:
  description: "Get current date and time information including date, time, day of wee"
browser_get_weather:
  description: "Get current weather conditions for the user's detected location."
browser_detect_captcha:
  description: "Detect if the current page contains a CAPTCHA challenge using heuristi"
browser_classify_captcha:
  description: "Identify the type of CAPTCHA on the page."
browser_solve_text_captcha:
  description: "Solve a text-based CAPTCHA by using vision model to read distorted cha"
  params:
    - max_attempts (integer): "Maximum number of attempts to solve the CAPTCHA before giving up."
browser_solve_image_captcha:
  description: "Solve an image-selection CAPTCHA (e.g., 'select all images with traffi"
  params:
    - max_attempts (integer): "Maximum number of attempts to solve the CAPTCHA."
    - provider (enum): "CAPTCHA provider hint for optimized solving: 'auto' (detect automatica [auto, owl, recaptcha, cloudflare, hcaptcha]"
browser_solve_captcha:
  description: "Auto-detect and solve any supported CAPTCHA type."
  params:
    - max_attempts (integer): "Maximum number of attempts to solve the CAPTCHA."
    - provider (enum): "CAPTCHA provider hint for optimized solving: 'auto' (detect automatica [auto, owl, recaptcha, cloudflare, hcaptcha]"
browser_get_cookies:
  description: "Retrieve all cookies from the browser context with full details includ"
  params:
    - url (string): "Filter cookies to only those applicable to this URL."
browser_set_cookie:
  description: "Set a cookie in the browser context with full control over all attribu"
  params:
    - url (string, required): "URL to associate with the cookie for domain validation."
    - name (string, required): "The cookie name."
    - value (string, required): "The cookie value."
    - domain (string): "Cookie domain scope."
    - path (string): "URL path that must exist in the request URL for the cookie to be sent."
    - secure (boolean): "If true, cookie is only sent over HTTPS connections."
    - httpOnly (boolean): "If true, cookie cannot be accessed via JavaScript (document.cookie)."
    - sameSite (enum): "Cross-site request restriction: 'strict' (only same-site), 'lax' (same [none, lax, strict]"
    - expires (integer): "Unix timestamp (seconds since epoch) when the cookie expires."
browser_delete_cookies:
  description: "Delete cookies from the browser context."
  params:
    - url (string): "Delete only cookies applicable to this URL."
    - cookie_name (string): "Delete only the cookie with this specific name."
browser_get_headers:
  description: "Get HTTP response headers from the current page or a specific URL in t"
  params:
    - url (string): "Filter headers by URL."
browser_set_proxy:
  description: "Configure proxy settings for the browser context after creation."
  params:
    - type (enum, required): "Proxy protocol type: 'http' (HTTP proxy), 'https' (HTTPS proxy), 'sock [http, https, socks4, socks5, socks5h, gae]"
    - host (string, required): "Proxy server hostname or IP address."
    - port (integer, required): "Proxy server port number."
    - username (string): "Username for proxy authentication."
    - password (string): "Password for proxy authentication."
    - stealth (boolean): "Enable stealth mode to prevent proxy/VPN detection."
    - block_webrtc (boolean): "Block WebRTC to prevent real IP address leaking through STUN/TURN requ"
    - spoof_timezone (boolean): "Automatically spoof browser timezone to match proxy location (detected"
    - spoof_language (boolean): "Automatically spoof browser language preferences to match proxy locati"
    - timezone_override (string): "Manually set timezone instead of auto-detection."
    - language_override (string): "Manually set browser language instead of auto-detection."
    - ca_cert_path (string): "Path to custom CA certificate file (.pem, .crt, .cer) for SSL intercep"
    - ca_key_path (string): "Path to CA private key file for GAE/private app proxy MITM."
    - trust_custom_ca (boolean): "Trust the custom CA certificate for SSL interception."
    - is_tor (boolean): "Explicitly mark this proxy as a Tor connection."
    - tor_control_port (integer): "Tor control port for circuit isolation."
    - tor_control_password (string): "Password for Tor control port authentication."
browser_get_proxy_status:
  description: "Get the current proxy configuration and connection status."
browser_connect_proxy:
  description: "Enable and connect the configured proxy for the browser context."
browser_disconnect_proxy:
  description: "Disable the proxy and revert to direct connection."
browser_set_timezone:
  description: "Set or override the timezone for an existing browser context at runtim"
  params:
    - timezone (string, required): "IANA timezone format (e.g., 'America/New_York', 'Europe/London', 'Asia"
browser_create_profile:
  description: "Create a new browser profile with randomized fingerprints (user agent,"
  params:
    - name (string): "Human-readable name for the profile to help identify it."
browser_load_profile:
  description: "Load a saved browser profile from a JSON file into an existing context"
  params:
    - profile_path (string, required): "Path to the profile .json file to load (or upload file via multipart/f"
browser_save_profile:
  description: "Save the current context state (fingerprints, cookies, settings) to a "
  params:
    - profile_name (string, required): "Name for the saved profile file (without path)."
browser_download_profile:
  description: "Download a saved profile file from the server."
  params:
    - profile_name (string, required): "Name of the profile file to download (as returned by browser_save_prof"
browser_get_profile:
  description: "Get the current profile state for the context as JSON."
browser_update_profile_cookies:
  description: "Update the profile file with the current cookies from the browser."
browser_get_context_info:
  description: "Get context information including the VM profile and fingerprint hashe"
browser_add_network_rule:
  description: "Add a rule to intercept network requests matching a URL pattern."
  params:
    - url_pattern (string, required): "URL pattern to match requests against."
    - action (enum, required): "What to do with matching requests: 'allow' lets them through (useful w [allow, block, mock, redirect]"
    - is_regex (boolean): "Treat url_pattern as a regular expression instead of glob pattern."
    - redirect_url (string): "The URL to redirect matching requests to."
    - mock_body (string): "Response body to return for mocked requests."
    - mock_status (integer): "HTTP status code for mocked responses (e.g., 200, 404, 500)."
    - mock_content_type (string): "Content-Type header for mocked responses."
browser_remove_network_rule:
  description: "Remove a previously added network interception rule by its ID."
  params:
    - rule_id (string, required): "The unique identifier of the rule to remove, returned by browser_add_n"
browser_enable_network_interception:
  description: "Enable or disable network interception for a context."
  params:
    - enable (boolean, required): "Set to true to enable network interception (rules start applying), fal"
browser_get_network_log:
  description: "Retrieve the log of captured network requests and responses for debugg"
browser_get_network_rules:
  description: "List all network interception rules for a context."
browser_clear_network_log:
  description: "Clear all captured network log entries for the context."
browser_enable_network_logging:
  description: "Enable or disable network request/response logging for the context."
  params:
    - enable (boolean, required): "Enable (true) or disable (false) network logging for this context"
browser_disable_network_interception:
  description: "Disable network interception for a context."
browser_disable_network_logging:
  description: "Disable network request/response logging for a context."
browser_set_download_path:
  description: "Configure the directory where browser downloads will be saved."
  params:
    - path (string, required): "Absolute path to the directory where downloaded files will be saved."
browser_get_downloads:
  description: "List all downloads for the context with their current status."
browser_get_active_downloads:
  description: "Get only currently active (in-progress) downloads for the context."
browser_wait_for_download:
  description: "Block until a specific download completes or times out."
  params:
    - download_id (string, required): "The unique identifier of the download to wait for, obtained from brows"
    - timeout (integer): "Maximum time to wait for download completion in milliseconds."
browser_cancel_download:
  description: "Cancel an in-progress download."
  params:
    - download_id (string, required): "The unique identifier of the in-progress download to cancel."
browser_set_dialog_action:
  description: "Configure automatic handling for JavaScript dialogs (alert, confirm, p"
  params:
    - dialog_type (enum, required): "Type of JavaScript dialog to configure: 'alert' (message display), 'co [alert, confirm, prompt, beforeunload]"
    - action (enum, required): "How to automatically handle this dialog type: 'accept' clicks OK/Yes,  [accept, dismiss, accept_with_text, wait]"
    - prompt_text (string): "Default text to enter when accepting prompt dialogs."
browser_get_pending_dialog:
  description: "Check if there's a JavaScript dialog currently waiting for user action"
browser_get_dialogs:
  description: "Get all dialog events that have occurred in the context, including han"
browser_handle_dialog:
  description: "Manually accept or dismiss a specific dialog by its ID."
  params:
    - dialog_id (string, required): "The unique identifier of the dialog to handle, obtained from browser_g"
    - accept (boolean, required): "Set to true to accept (click OK/Yes) or false to dismiss (click Cancel"
    - response_text (string): "Text to enter in prompt dialogs before accepting."
browser_wait_for_dialog:
  description: "Wait for a JavaScript dialog to appear."
  params:
    - timeout (integer): "Maximum time to wait for a dialog to appear in milliseconds."
browser_set_popup_policy:
  description: "Configure how popup windows are handled when JavaScript calls window.o"
  params:
    - policy (enum, required): "How to handle popup windows opened by window.open() or target='_blank' [allow, block, new_tab, background]"
browser_get_tabs:
  description: "List all tabs/windows in the browser context."
browser_switch_tab:
  description: "Switch focus to a specific tab by its ID."
  params:
    - tab_id (string, required): "The unique identifier of the tab to activate/switch to, obtained from "
browser_close_tab:
  description: "Close a specific tab by its ID."
  params:
    - tab_id (string, required): "The unique identifier of the tab to close, obtained from browser_get_t"
browser_new_tab:
  description: "Open a new tab in the browser context."
  params:
    - url (string): "URL to navigate to in the new tab."
browser_get_active_tab:
  description: "Get information about the currently active (focused) tab."
browser_get_tab_count:
  description: "Get the number of open tabs in the context."
browser_get_blocked_popups:
  description: "Get a list of popup URLs that were blocked due to popup policy set to "
browser_get_element_at_position:
  description: "Get detailed information about the DOM element at specific XY coordina"
  params:
    - x (number, required): "X coordinate in pixels from the left edge of the viewport."
    - y (number, required): "Y coordinate in pixels from the top edge of the viewport."
browser_get_interactive_elements:
  description: "Find all interactive elements on the page (buttons, links, inputs, sel"
browser_get_page_map:
  description: "Get a compact structured map of all interactive elements on the page, "
  params:
    - intent (string): "Task description to boost matching elements (e.g., 'buy nvidia jetson'"
    - max_elements (integer): "Maximum number of elements to return (0 = unlimited)."
    - region (string): "Filter to specific page region: 'main', 'nav', 'header', 'footer', 'si"
browser_get_blocker_stats:
  description: "Get statistics about blocked ads, trackers, and analytics."
http_request:
  description: "Make an HTTP/HTTPS request with full control over method, headers, bod"
  params:
    - url (string, required): "The full URL to request, including protocol (e.g., 'https://api.exampl"
    - method (enum): "HTTP method to use. [GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS]"
    - headers (string | object): "HTTP headers as a JSON object (e.g., {\"Content-Type\": \"application/jso"
    - body (string): "Request body content."
    - cookies (string): "Cookies to send with the request."
    - auth_type (enum): "Authentication type. [none, basic, bearer, digest]"
    - auth_username (string): "Username for basic or digest authentication."
    - auth_password (string): "Password for basic or digest authentication."
    - auth_token (string): "Bearer token for bearer authentication (without 'Bearer ' prefix)."
    - proxy_type (enum): "Type of proxy server. [http, https, socks4, socks5, socks5h]"
    - proxy_host (string): "Proxy server hostname or IP address."
    - proxy_port (integer): "Proxy server port number."
    - proxy_username (string): "Username for proxy authentication."
    - proxy_password (string): "Password for proxy authentication."
    - use_tor (boolean): "Use built-in Tor proxy (SOCKS5H at 127.0.0.1:9050) for anonymous reque"
    - follow_redirects (boolean): "Follow HTTP redirects automatically."
    - max_redirects (integer): "Maximum number of redirects to follow."
    - timeout (integer): "Total request timeout in milliseconds."
    - connect_timeout (integer): "Connection timeout in milliseconds."
    - ssl_verify (boolean): "Verify SSL/TLS certificates."
    - ca_cert_path (string): "Path to custom CA certificate bundle file (.pem) for SSL verification."
    - user_agent (string): "Custom User-Agent header string."
    - output (enum): "Response body output format. [text, base64, json]"
http_download:
  description: "Download a file from an HTTP/HTTPS URL and save it to disk."
  params:
    - url (string, required): "The URL of the file to download."
    - output_path (string): "Full path where the file should be saved."
    - headers (string | object): "HTTP headers as a JSON object or string."
    - cookies (string): "Cookies to send with the request."
    - proxy_type (enum): "Type of proxy server. [http, https, socks4, socks5, socks5h]"
    - proxy_host (string): "Proxy server hostname or IP address."
    - proxy_port (integer): "Proxy server port number."
    - proxy_username (string): "Username for proxy authentication."
    - proxy_password (string): "Password for proxy authentication."
    - use_tor (boolean): "Use built-in Tor proxy for anonymous download."
    - timeout (integer): "Download timeout in milliseconds."
    - ssl_verify (boolean): "Verify SSL/TLS certificates."
    - ca_cert_path (string): "Path to custom CA certificate bundle file."
    - resume (boolean): "Resume an interrupted download if the server supports byte ranges."
    - user_agent (string): "Custom User-Agent header string."
http_session_create:
  description: "Create a persistent HTTP session with automatic cookie management."
  params:
    - headers (string | object): "Default HTTP headers as a JSON object or string."
    - user_agent (string): "Default User-Agent for all session requests."
    - follow_redirects (boolean): "Default redirect following behavior."
    - max_redirects (integer): "Default maximum redirects."
    - ssl_verify (boolean): "Default SSL verification."
    - ca_cert_path (string): "Default CA certificate path for the session."
    - proxy_type (enum): "Default proxy type for the session. [http, https, socks4, socks5, socks5h]"
    - proxy_host (string): "Default proxy host."
    - proxy_port (integer): "Default proxy port."
    - proxy_username (string): "Default proxy username."
    - proxy_password (string): "Default proxy password."
    - use_tor (boolean): "Use built-in Tor proxy for all session requests."
http_session_request:
  description: "Make an HTTP request within a persistent session."
  params:
    - session_id (string, required): "The session ID returned by http_session_create."
    - url (string, required): "The full URL to request."
    - method (enum): "HTTP method. [GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS]"
    - headers (string | object): "Additional HTTP headers as a JSON object or string."
    - body (string): "Request body content."
    - cookies (string): "Additional cookies for this request."
    - auth_type (enum): "Authentication type for this request. [none, basic, bearer, digest]"
    - auth_username (string): "Username for authentication."
    - auth_password (string): "Password for authentication."
    - auth_token (string): "Bearer token for authentication."
    - follow_redirects (boolean): "Override session redirect behavior."
    - timeout (integer): "Request timeout in milliseconds."
    - connect_timeout (integer): "Connection timeout in milliseconds."
    - ssl_verify (boolean): "Override session SSL verification."
    - user_agent (string): "Override session User-Agent."
    - output (enum): "Response body output format. [text, base64, json]"
http_session_get_cookies:
  description: "Get all cookies stored in a session."
  params:
    - session_id (string, required): "The session ID."
    - url (string): "Optional URL filter."
http_session_set_cookies:
  description: "Import cookies into a session."
  params:
    - session_id (string, required): "The session ID."
    - cookies (string, required): "Cookies to set."
http_session_close:
  description: "Close and destroy an HTTP session, freeing all stored cookies and reso"
  params:
    - session_id (string, required): "The session ID to close and destroy."
http_session_list:
  description: "List all active HTTP sessions with their IDs, creation time, last used"
browser_webmcp_get_tools:
  description: "Get all WebMCP tools declared by the current page via navigator.modelC"
browser_webmcp_call_tool:
  description: "Execute a WebMCP tool that was declared by the page."
  params:
    - tool_name (string, required): "The name of the WebMCP tool to execute (as registered by the page via "
    - input (object): "JSON object containing input parameters for the tool, matching the too"
browser_webmcp_refresh_tools:
  description: "Trigger a re-scan of the page for declarative <form toolname> elements"
browser_webmcp_get_all_tools:
  description: "Get all WebMCP tools across all browser contexts."
```
