Extract Network

The Extract Network node intercepts XHR and Fetch requests made by the page and extracts data from their responses. This is useful when you’re looking to extract a lot of data and want to optimize runtime or the data you need isn’t rendered in the DOM but is available in API responses.

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	URL pattern to match against intercepted requests
`extract_data_model`	object	Yes	Data model schema for extracting specific fields using JSONata (also accepts JSONPath-style `$.field` syntax)
`selector`	string	No	XPath expression to wait for before extracting network data
`wait_time`	number	No	Maximum time (ms) to wait for the `selector` to appear. Only used when `selector` is set. Default: 15000
`full_request`	boolean	No	When true, adds full request/response metadata to the extracted data. Default: false

URL Matching

The node supports three matching modes. If there are several matchees, the most recent matching request is returned.

Exact Match

If the provided URL exactly matches one request, the network data of that request is returned.

Substring Match

If no exact match is found, the node falls back to substring matching. Say you specify /api/users/ as url, then any request containing /api/users/ in the URL would be matched and the most recent request would be returned.

Regex Match

For complex patterns, prefix the URL with regex: to use regular expressions.

regex:/api/v1/coverages/-?\d+

This matches:

/api/v1/coverages/123
/api/v1/coverages/-456789

But not:

/api/v1/coverages/ (no number)

Full Request

When full_request is enabled, complete request and response metadata is added directly inside each extracted property. This is useful when you need to capture authentication tokens, cookies, or other request/response details.

Default Output (full_request: false)

{
  "user_data": [
    {
      "id": "123",
      "name": "John Doe"
    }
  ]
}

With Full Request (full_request: true)

When full_request is enabled, single-element arrays are unwrapped and the extracted data is placed inside response.body:

{
  "user_data": {
    "request": {
      "url": "https://api.example.com/users/123",
      "method": "GET",
      "headers": {
        "authorization": "Bearer xxx",
        "content-type": "application/json"
      },
      "body": null
    },
    "response": {
      "status": 200,
      "statusText": "OK",
      "headers": {
        "content-type": "application/json"
      },
      "body": {
        "id": "123",
        "name": "John Doe"
      }
    },
    "timestamp": "2024-01-15T10:30:00.000Z"
  }
}

The additional properties added:

request.url: The full request URL
request.method: HTTP method (GET, POST, etc.)
request.headers: Request headers sent with the request
request.body: Request body (for POST/PUT requests)
response.status: HTTP status code
response.statusText: HTTP status text
response.headers: Response headers
response.body: The extracted response data
timestamp: When the request was made

Examples

Basic Usage

Extract user data from an API response:

{
  "url": "/api/v1/users",
  "extract_data_model": {
    "type": "object",
    "properties": {
      "user_id": {
        "type": "string",
        "path": "$.id",
        "selected": true
      },
      "email": {
        "type": "string",
        "path": "$.email",
        "selected": true
      }
    }
  }
}

With Regex URL Matching

Extract specific fields using regex to match dynamic URLs:

{
  "url": "regex:/api/v1/orders/\\d+",
  "extract_data_model": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "path": "$.id",
        "selected": true
      },
      "status": {
        "type": "string", 
        "path": "$.order.status",
        "selected": true
      }
    }
  }
}

Common path expressions:

$ — root object
$.field or field — direct field access
$.parent.child or parent.child — nested field
$[0] — first array element

Wait for Element First

Wait for a specific element to appear before extracting network data:

{
  "url": "/api/dashboard/data",
  "selector": "//*[@data-loaded='true']",
  "wait_time": 20000,
  "extract_data_model": {
    "type": "object",
    "properties": {
      "dashboard_data": {
        "type": "string",
        "path": "$",
        "selected": true
      }
    }
  }
}

Notes

The node automatically trims whitespace from URL patterns
When multiple requests match, the most recent match is used for extraction
Supported response types: JSON, HTML, and XML. Non-JSON responses are returned as raw strings.
Network traffic classified as noise (e.g. analytics, tracking pixels, static assets, health checks) is filtered out from extraction candidates by default. If the request you’re trying to extract doesn’t appear, toggle Filter noise off in the Network Traffic panel to see all captured requests.

Getting Started

Concepts

API Reference

SDK

Integrations

Parameters

URL Matching

Exact Match

Substring Match

Regex Match

Full Request

Default Output (full_request: false)

With Full Request (full_request: true)

Examples

Basic Usage

With Regex URL Matching

Wait for Element First

Notes

Getting Started

Concepts

API Reference

SDK

Integrations

​Parameters

​URL Matching

​Exact Match

​Substring Match

​Regex Match

​Full Request

​Default Output (full_request: false)

​With Full Request (full_request: true)

​Examples

​Basic Usage

​With Regex URL Matching

​Wait for Element First

​Notes

Parameters

URL Matching

Exact Match

Substring Match

Regex Match

Full Request

Default Output (full_request: false)

With Full Request (full_request: true)

Examples

Basic Usage

With Regex URL Matching

Wait for Element First

Notes