DocsCAPTCHA Solvingbrowser_solve_image_captcha

browser_solve_image_captcha

Use browser_solve_image_captcha when you need to detect and solve CAPTCHA challenges without a third-party service. It is part of Owl Browser's CAPTCHA Solving toolset and runs inside a self-hosted, source-level stealth engine, so every call inherits the same undetectable browser fingerprint as the rest of your automation — no separate anti-detect setup required.

Usage Example

1234567891011

import asyncio

from owl_browser import OwlBrowser, RemoteConfig

# Async usage

async with OwlBrowser(config) as browser:

context = await browser.create_context()

context_id = context["context_id"]

await browser.solve_image_captcha(

context_id=context_id

)

Parameters

Required

context_idstringrequired

The unique identifier of the browser context (e.g., 'ctx_000001')

Optional

max_attemptsinteger

Maximum number of attempts to solve the CAPTCHA. For image selection, each attempt may select different images. Default: 3

providerenum

autoowlrecaptchacloudflare+1 more

CAPTCHA provider hint for optimized solving: 'auto' (detect automatically), 'owl' (Owl test CAPTCHAs), 'recaptcha' (Google reCAPTCHA), 'cloudflare' (Turnstile), 'hcaptcha'. Default: 'auto'

Response

Returns a JSON object with the operation result.

{
  "success": true,
  "result": <value>
}

Frequently Asked Questions

What does browser_solve_image_captcha do?

Solve an image-selection CAPTCHA (e.g., 'select all images with traffic lights'). Uses vision model with numbered overlays for one-shot analysis. Supports reCAPTCHA, Cloudflare Turnstile, and hCaptcha. It belongs to Owl Browser's CAPTCHA Solving category and is available through the REST API, the Python SDK (browser.solve_image_captcha()), the Node.js SDK, and the MCP server.

What parameters does browser_solve_image_captcha accept?

browser_solve_image_captcha accepts 1 required parameter (context_id) and 2 optional parameters. All parameters are sent as JSON in a POST request to /api/execute/browser_solve_image_captcha.

Is browser_solve_image_captcha detectable by anti-bot systems like Cloudflare or DataDome?

No. browser_solve_image_captcha executes inside Owl Browser's Chromium engine, which applies fingerprint spoofing at the C++ source level rather than through JavaScript patches. Every tool call shares the same consistent, human-like fingerprint, so anti-bot systems such as Cloudflare, DataDome, and Akamai see an ordinary browser.

Related Tools

browser_detect_captcha

Detect if the current page contains a CAPTCHA challenge using heuristic analysis (no vision model). Returns detection result with confidence score. Use before attempting CAPTCHA solving.

browser_classify_captcha

Identify the type of CAPTCHA on the page. Classifies as text-based, image-selection, checkbox (reCAPTCHA v2), puzzle, audio, or custom. Returns element selectors for the CAPTCHA components.

browser_solve_text_captcha

Solve a text-based CAPTCHA by using vision model to read distorted characters. Automatically finds the CAPTCHA image, extracts the text, enters it in the input field, and optionally submits.

browser_solve_captcha

Auto-detect and solve any supported CAPTCHA type. Automatically detects whether it's text-based or image-selection and applies the appropriate solving strategy. Most convenient option when CAPTCHA type is unknown.

browser_create_context

Create a new isolated browser context with its own cookies, storage, and optional proxy configuration. Each context acts as an independent browser session. Use this to create multiple isolated browsing sessions, configure proxy/Tor connections, load browser profiles with saved fingerprints, and enable/disable LLM features. Returns a context_id to use with other browser tools.

browser_navigate

Navigate the browser to a specified URL. This is a non-blocking operation that starts navigation and returns immediately. Use browser_wait_for_network_idle or browser_wait_for_selector to wait for the page to fully load. Supports HTTP, HTTPS, file, and data URLs. When wait_until is set (load, networkidle, fullscroll, domcontentloaded) and the page declares WebMCP tools, the response includes a webmcp_tools array containing the full tool definitions (name, description, inputSchema). Use browser_webmcp_call_tool to execute any of these tools directly.

Browse the full Owl Browser API reference or get started with the Python SDK and Node.js SDK.