DocsContent Extractionbrowser_extract_json

browser_extract_json

Browser Extract Json

Extract structured data from the page as JSON. For known sites (Google, Amazon, Wikipedia, etc.) uses predefined templates. For unknown sites, uses smart DOM analysis to detect repeating items (products, posts, search results) and extract fields (title, link, image, price, rating, date, description). Use 'selector' to scope extraction to a specific container. Returns page type, items array, structured LD+JSON data, and meta tags.

Usage Example

1234567891011
import asyncio
from owl_browser import OwlBrowser, RemoteConfig
# Async usage
async with OwlBrowser(config) as browser:
context = await browser.create_context()
context_id = context["context_id"]
await browser.extract_json(
context_id=context_id
)

Parameters

Required

context_idstringrequired

The unique identifier of the browser context (e.g., 'ctx_000001')

Optional

templatestring

Extraction template name for known site types. Available: 'google_search', 'duckduckgo_search', 'wikipedia', 'amazon_product', 'github_repo', 'twitter_feed', 'reddit_thread'. Leave empty for auto-detection based on URL

selectorstring

CSS selector to scope extraction to a specific container (e.g., 'div.product-grid', '#results'). When set, only elements within this container are analyzed for repeating patterns

Response

Returns a JSON object with the operation result.

{
  "success": true,
  "result": <value>
}