Extract Datamodel

The Extract Datamodel node extracts structured data from the current page based on a JSON schema. This is useful for scraping data, validating page content, or capturing information for later use in the workflow.

Parameters

Parameter	Type	Required	Description
`extract_data_model`	object	Yes	JSON schema defining the data structure to extract
`execution`	string	No	Execution type: `STATIC` (UI: “Static”), `LLM_DOM` (UI: “AI (HTML)”), `LLM_VISION` (UI: “AI (Screenshot)”), or `PROMPT` (UI: “AI (Context)”). Default: `LLM_DOM`
`selector`	string	Conditional	XPath to scope the extraction area. Required for `LLM_DOM`, optional for `STATIC` (to wait for element before extracting), not used for `LLM_VISION`
`prompt`	string	Conditional	Additional instructions for LLM extraction. Required for `PROMPT` execution
`wait_time`	number	No	Maximum time (ms) to wait for the selector. Default: 15000. Only used when `selector` is provided
`selector_error_message`	string	No	Custom error message if selector is not found
`llm_model`	string	No	Override the default LLM model for extraction. Only used for `LLM_DOM` and `LLM_VISION`
`keep_html_metadata`	boolean	No	If true, preserve HTML attributes (classes, IDs, data attributes) when sending to LLM. Default: false (HTML is sanitized). Enable this when you need to extract data from HTML attributes e.g. IDs. Only used for `LLM_DOM`

Schema Structure

The extract_data_model follows JSON Schema with CloudCruise extensions:

{
  "type": "object",
  "properties": {
    "field_name": {
      "type": "string",
      "selected": true,
      "description": "Description for LLM extraction",
      "path": "//xpath/expression",
      "mode": "xpath"
    }
  }
}

Schema Properties

Property	Description
`type`	Data type: `string`, `number`, `boolean`, `array`, `object`
`selected`	Set to `true` to include this field in extraction
`description`	Description to help LLM understand what to extract
`path`	XPath expression for `STATIC` extraction
`mode`	Set to `xpath` for XPath-based extraction

Examples

Basic Extraction with LLM_DOM

Extract user information using AI:

{
  "id": "abc123",
  "name": "Extract user details",
  "action": "EXTRACT_DATAMODEL",
  "parameters": {
    "execution": "LLM_DOM",
    "extract_data_model": {
      "type": "object",
      "properties": {
        "user_name": {
          "type": "string",
          "selected": true,
          "description": "The user's full name displayed in the header"
        },
        "email": {
          "type": "string",
          "selected": true,
          "description": "The user's email address"
        },
        "account_status": {
          "type": "string",
          "selected": true,
          "description": "The account status (active, inactive, pending)"
        }
      }
    }
  }
}

STATIC Extraction with XPath

Extract data using explicit XPath selectors:

{
  "id": "abc123",
  "name": "Extract order details",
  "action": "EXTRACT_DATAMODEL",
  "parameters": {
    "execution": "STATIC",
    "extract_data_model": {
      "type": "object",
      "properties": {
        "order_id": {
          "type": "string",
          "selected": true,
          "path": "//span[@data-testid='order-id']",
          "mode": "xpath"
        },
        "total_amount": {
          "type": "string",
          "selected": true,
          "path": "//div[@class='total']//span[@class='amount']",
          "mode": "xpath"
        }
      }
    }
  }
}

You can also extract HTML attributes (e.g., id, href, data-*) by pointing the XPath to the attribute:

{
  "product_ids": {
    "type": "array",
    "items": {
      "type": "string"
    },
    "selected": true,
    "path": "//div[@class='product-card']/@data-product-id",
    "mode": "xpath"
  }
}

Arrays

Extract Array of Items

Extract a list of items from the page:

{
  "id": "abc123",
  "name": "Extract product list",
  "action": "EXTRACT_DATAMODEL",
  "parameters": {
    "execution": "LLM_DOM",
    "extract_data_model": {
      "type": "object",
      "properties": {
        "products": {
          "type": "array",
          "selected": true,
          "items": {
            "type": "object",
            "properties": {
              "name": {
                "type": "string",
                "description": "Product name"
              },
              "price": {
                "type": "string",
                "description": "Product price"
              },
              "sku": {
                "type": "string",
                "description": "Product SKU"
              }
            }
          },
          "description": "List of all products shown in the search results"
        }
      }
    }
  }
}

Static Array Extraction

To extract an array using STATIC execution, provide an XPath that matches multiple elements. Each matched element becomes an item in the array:

{
  "id": "abc123",
  "name": "Extract all product names",
  "action": "EXTRACT_DATAMODEL",
  "parameters": {
    "execution": "STATIC",
    "extract_data_model": {
      "type": "object",
      "properties": {
        "product_names": {
          "type": "array",
          "items": {
            "type": "string"
          },
          "selected": true,
          "path": "//div[@class='product-card']//h3[@class='product-name']",
          "mode": "xpath"
        }
      }
    }
  }
}

For extracting an array of objects (e.g., table rows with multiple columns), define the path on the array to match the repeating container elements, then use relative XPaths for each property within the items:

{
  "id": "abc123",
  "name": "Extract table rows",
  "action": "EXTRACT_DATAMODEL",
  "parameters": {
    "execution": "STATIC",
    "extract_data_model": {
      "type": "object",
      "properties": {
        "orders": {
          "type": "array",
          "selected": true,
          "path": "//table[@id='orders-table']//tbody/tr",
          "mode": "xpath",
          "items": {
            "type": "object",
            "properties": {
              "order_id": {
                "type": "string",
                "path": "/td[1]",
                "mode": "xpath"
              },
              "customer": {
                "type": "string",
                "path": "/td[2]",
                "mode": "xpath"
              },
              "amount": {
                "type": "string",
                "path": "/td[3]",
                "mode": "xpath"
              },
              "status": {
                "type": "string",
                "path": "/td[4]",
                "mode": "xpath"
              }
            }
          }
        }
      }
    }
  }
}

The array’s path matches each <tr> row, and each property uses a relative XPath to extract the corresponding cell within that row.

Overwrite Arrays

Arrays are ‘append’ by default. If you extract into the same array twice e.g. in a loop, new items will be appended. You can override this behavior by adding the array key to the overwriteArrayKeys array. Here’s an example JSON schema you could use in a ExtractDatamodel node:

{
  "type": "object",
  "properties": {
    "organization_names": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "selected": true,
      "description": "A list of all organization names. The organization name is written on top of each card and enclosed by a headline tag"
    }
  },
  "overwriteArrayKeys": [
    "organization_names"
  ]
}

Access Browser Variables

We allow extraction of some browser variables:

The complete URL the browser agent is on: {{window.location.href}}
The path name of the current URL: {{window.location.pathname}}
The query string of the current URL: {{window.location.search}}

Here’s an example JSON schema you can use in a ExtractDatamodel node:

{
  "type": "object",
  "properties": {
    "current_url": {
      "type": "string",
      "selected": true,
      "path": "{{window.location.href}}",
      "mode": "xpath"
    },
    "path_name": {
      "type": "string",
      "selected": true,
      "path": "{{window.location.pathname}}",
      "mode": "xpath"
    },
    "query_string": {
      "type": "string",
      "selected": true,
      "path": "{{window.location.search}}",
      "mode": "xpath"
    }
  }
}

Note that the execution type for this needs to be STATIC.

Extract Raw HTML

You can extract the HTML content of the current page using document variables:

Sanitized HTML ({{document.sanitized}}): Extracts a simplified version of the HTML that removes most attributes and only maintains the structure, tags, and content. This is useful for cleaner data extraction and reduces noise when processing HTML.
Complete HTML ({{document}}): Extracts the entire raw HTML with all attributes intact, including classes, IDs, data attributes, styles, and other metadata.

Here’s an example JSON schema you can use in a ExtractDatamodel node:

{
  "type": "object",
  "properties": {
    "sanitized_html": {
      "type": "string",
      "selected": true,
      "path": "{{document.sanitized}}",
      "mode": "xpath",
      "description": "Clean HTML with structure, tags, and content only"
    },
    "complete_html": {
      "type": "string",
      "selected": true,
      "path": "{{document}}",
      "mode": "xpath",
      "description": "Full raw HTML with all attributes"
    }
  }
}

Note that the execution type for this needs to be STATIC.

Notes

Use STATIC execution with XPaths for speed and reliability when page structure is stable
Use LLM_DOM for complex pages or when selectors frequently change
Add clear descriptions for each field to help the LLM understand what data to extract
Arrays extracted multiple times (e.g., in a loop) append by default; use overwriteArrayKeys to replace

Getting Started

Concepts

API Reference

SDK

Integrations

Parameters

Schema Structure

Schema Properties

Examples

Basic Extraction with LLM_DOM

STATIC Extraction with XPath

Arrays

Extract Array of Items

Static Array Extraction

Overwrite Arrays

Access Browser Variables

Extract Raw HTML

Notes

Getting Started

Concepts

API Reference

SDK

Integrations

​Parameters

​Schema Structure

​Schema Properties

​Examples

​Basic Extraction with LLM_DOM

​STATIC Extraction with XPath

​Arrays

​Extract Array of Items

​Static Array Extraction

​Overwrite Arrays

​Access Browser Variables

​Extract Raw HTML

​Notes

Parameters

Schema Structure

Schema Properties

Examples

Basic Extraction with LLM_DOM

STATIC Extraction with XPath

Arrays

Extract Array of Items

Static Array Extraction

Overwrite Arrays

Access Browser Variables

Extract Raw HTML

Notes