Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
extract_data_model | object | Yes | JSON schema defining the data structure to extract |
execution | string | No | Execution type: STATIC (UI: “Static”), LLM_DOM (UI: “AI (HTML)”), LLM_VISION (UI: “AI (Screenshot)”), or PROMPT (UI: “AI (Context)”). Default: LLM_DOM |
selector | string | Conditional | XPath to scope the extraction area. Required for LLM_DOM, optional for STATIC (to wait for element before extracting), not used for LLM_VISION |
prompt | string | Conditional | Additional instructions for LLM extraction. Required for PROMPT execution |
wait_time | number | No | Maximum time (ms) to wait for the selector. Default: 15000. Only used when selector is provided |
selector_error_message | string | No | Custom error message if selector is not found |
llm_model | string | No | Override the default LLM model for extraction. Only used for LLM_DOM and LLM_VISION |
keep_html_metadata | boolean | No | If true, preserve HTML attributes (classes, IDs, data attributes) when sending to LLM. Default: false (HTML is sanitized). Enable this when you need to extract data from HTML attributes e.g. IDs. Only used for LLM_DOM |
Schema Structure
Theextract_data_model follows JSON Schema with CloudCruise extensions:
Schema Properties
| Property | Description |
|---|---|
type | Data type: string, number, boolean, array, object |
selected | Set to true to include this field in extraction |
description | Description to help LLM understand what to extract |
path | XPath expression for STATIC extraction |
mode | Set to xpath for XPath-based extraction |
Examples
Basic Extraction with LLM_DOM
Extract user information using AI:STATIC Extraction with XPath
Extract data using explicit XPath selectors:id, href, data-*) by pointing the XPath to the attribute:
Arrays
Extract Array of Items
Extract a list of items from the page:Static Array Extraction
To extract an array usingSTATIC execution, provide an XPath that matches multiple elements. Each matched element becomes an item in the array:
path on the array to match the repeating container elements, then use relative XPaths for each property within the items:
path matches each <tr> row, and each property uses a relative XPath to extract the corresponding cell within that row.
Overwrite Arrays
Arrays are ‘append’ by default. If you extract into the same array twice e.g. in a loop, new items will be appended. You can override this behavior by adding the array key to theoverwriteArrayKeys array. Here’s an example JSON schema you could use in a ExtractDatamodel node:
Access Browser Variables
We allow extraction of some browser variables:- The complete URL the browser agent is on:
{{window.location.href}} - The path name of the current URL:
{{window.location.pathname}} - The query string of the current URL:
{{window.location.search}}
STATIC.
Extract Raw HTML
You can extract the HTML content of the current page using document variables:- Sanitized HTML (
{{document.sanitized}}): Extracts a simplified version of the HTML that removes most attributes and only maintains the structure, tags, and content. This is useful for cleaner data extraction and reduces noise when processing HTML. - Complete HTML (
{{document}}): Extracts the entire raw HTML with all attributes intact, including classes, IDs, data attributes, styles, and other metadata.
STATIC.
Notes
- Use
STATICexecution with XPaths for speed and reliability when page structure is stable - Use
LLM_DOMfor complex pages or when selectors frequently change - Add clear descriptions for each field to help the LLM understand what data to extract
- Arrays extracted multiple times (e.g., in a loop) append by default; use
overwriteArrayKeysto replace

