DocsContent Extractionbrowser_get_html

browser_get_html

Browser Get Html

Extract HTML content with configurable cleaning levels. Optionally target a specific element using a CSS selector to get its innerHTML instead of the full page. When the selector matches multiple elements, returns a JSON array of HTML strings. 'minimal' preserves most structure, 'basic' removes scripts and styles, 'aggressive' strips to essential content. Useful for page analysis, content extraction, and feeding HTML to other processing tools.

Usage Example

1234567891011
import asyncio
from owl_browser import OwlBrowser, RemoteConfig
# Async usage
async with OwlBrowser(config) as browser:
context = await browser.create_context()
context_id = context["context_id"]
await browser.get_html(
context_id=context_id
)

Parameters

Required

context_idstringrequired

The unique identifier of the browser context (e.g., 'ctx_000001')

Optional

selectorstring

Optional CSS selector to get innerHTML of a specific element (e.g., '#content', '.article-body'). When the selector matches multiple elements, returns a JSON array of HTML strings (one per element). When omitted, returns the full page HTML

clean_levelenum
minimalbasicaggressive

Level of HTML cleaning to apply. 'minimal': preserves most structure, 'basic': removes scripts/styles (default), 'aggressive': strips to essential content only. Choose based on how much structure you need

Response

Returns a JSON object with the operation result.

{
  "success": true,
  "result": <value>
}