DocsContent Extractionbrowser_extract_text

browser_extract_text

Browser Extract Text

Extract visible text content from the page or a specific element. Returns plain text stripped of HTML tags. Optionally target a specific element using CSS selector or natural language description. When the selector matches multiple elements, returns a JSON array of text strings. Optionally apply a regex pattern to filter/extract specific content from the text (applied per-element when multiple matches). Useful for reading page content, extracting article text, getting form values, or extracting specific data like numbers, emails, or prices.

Usage Example

1234567891011
import asyncio
from owl_browser import OwlBrowser, RemoteConfig
# Async usage
async with OwlBrowser(config) as browser:
context = await browser.create_context()
context_id = context["context_id"]
await browser.extract_text(
context_id=context_id
)

Parameters

Required

context_idstringrequired

The unique identifier of the browser context (e.g., 'ctx_000001')

Optional

selectorstring

Optional CSS selector or natural language description to extract text from a specific element. If omitted, extracts all visible text from the entire page. When the selector matches multiple elements, returns a JSON array of text strings (one per element). Examples: '#main-content', '.product-name', 'article'

regexstring

Optional regex pattern to apply to the extracted text. When provided, only content matching the regex will be returned. If both selector and regex are provided, the selector extracts text first, then regex filters the result. Examples: '\\d+' to extract numbers, '(?<=Price: )\\$[\\d.]+' to extract prices, '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}' for emails

regex_groupstring

Which capture group to return from the regex match. 0 returns the full match (default), 1 returns the first capture group, 2 returns the second, etc. Example: with regex 'FP ID:\\s*(\\S+)' and regex_group=1, only the ID value is returned.

indexstring

When multiple elements match the selector, extract text from the Nth element (0-based) instead of returning all. Default: not set (returns all matches as array). Set to 0 for first, 1 for second, -1 for last.

Response

Returns a JSON object with the operation result.

{
  "success": true,
  "result": <value>
}