> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cloudcruise.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Extract Network

> Capture and extract data from network requests

The **Extract Network** node intercepts XHR and Fetch requests made by the page and extracts data from their responses. This is useful when you're looking to extract a lot of data and want to optimize runtime or the data you need isn't rendered in the DOM but is available in API responses.

## Parameters

| Parameter                 | Type    | Required | Description                                                                                                                                                                |
| ------------------------- | ------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `url`                     | string  | Yes      | URL pattern to match against intercepted requests                                                                                                                          |
| `extract_data_model`      | object  | Yes      | Data model schema for extracting specific fields using [JSONata](https://jsonata.org/) (also accepts JSONPath-style `$.field` syntax)                                      |
| `selector`                | string  | No       | XPath expression to wait for before extracting network data                                                                                                                |
| `wait_time`               | number  | No       | Maximum total time (ms) to wait for both the `selector` (if set) to appear and a matching network request to be captured. The two phases share this budget. Default: 15000 |
| `full_request`            | boolean | No       | When true, adds full request/response metadata to the extracted data. Default: false                                                                                       |
| `allow_empty_on_no_match` | boolean | No       | When true, continues with an empty object if no matching request is captured within `wait_time`. Default: false                                                            |
| `api_method`              | string  | No       | HTTP method filter (for example, `GET` or `POST`) applied when matching captured requests                                                                                  |

If no matching request is captured within `wait_time`, the node fails with an error by default rather than proceeding with an empty response. Set `allow_empty_on_no_match` to `true` to continue with `{}` instead. Make sure the URL is pointed to by an `EXTRACT_NETWORK` node and that the page actually fires the request as part of the workflow.

## URL Matching

The node supports exact, substring, and wildcard matching. If there are several matches, the most recent matching request is returned.

### Exact Match

If the provided URL exactly matches one request, the network data of that request is returned.

### Substring Match

If no exact match is found, the node falls back to substring matching. Say you specify `/api/users/` as url, then any request containing `/api/users/` in the URL would be matched and the most recent request would be returned.

### Wildcard Match

Use `*` for dynamic URL segments or suffixes.

```
/api/v1/coverages/*
```

This matches:

* `/api/v1/coverages/123`
* `/api/v1/coverages/456789`
* `/api/v1/coverages/123?include=details`

But **not**:

* `/api/v1/customers/123`

## Full Request

When `full_request` is enabled, complete request and response metadata is added directly inside each extracted property. This is useful when you need to capture authentication tokens, cookies, or other request/response details.

### Default Output (full\_request: false)

```json theme={null}
{
  "user_data": [
    {
      "id": "123",
      "name": "John Doe"
    }
  ]
}
```

### With Full Request (full\_request: true)

When `full_request` is enabled, single-element arrays are unwrapped and the extracted data is placed inside `response.body`:

```json theme={null}
{
  "user_data": {
    "request": {
      "url": "https://api.example.com/users/123",
      "method": "GET",
      "headers": {
        "authorization": "Bearer xxx",
        "content-type": "application/json"
      },
      "body": null
    },
    "response": {
      "status": 200,
      "statusText": "OK",
      "headers": {
        "content-type": "application/json"
      },
      "body": {
        "id": "123",
        "name": "John Doe"
      }
    },
    "timestamp": "2024-01-15T10:30:00.000Z"
  }
}
```

The additional properties added:

* **request.url**: The full request URL
* **request.method**: HTTP method (GET, POST, etc.)
* **request.headers**: Request headers sent with the request
* **request.body**: Request body (for POST/PUT requests)
* **response.status**: HTTP status code
* **response.statusText**: HTTP status text
* **response.headers**: Response headers
* **response.body**: The extracted response data
* **timestamp**: When the request was made

## Examples

### Basic Usage

Extract user data from an API response:

```json theme={null}
{
  "id": "abc123",
  "name": "Extract user data",
  "action": "EXTRACT_NETWORK",
  "parameters": {
    "url": "/api/v1/users",
    "extract_data_model": {
      "type": "object",
      "properties": {
        "user_id": {
          "type": "string",
          "path": "$.id",
          "selected": true
        },
        "email": {
          "type": "string",
          "path": "$.email",
          "selected": true
        }
      }
    }
  }
}
```

### With Wildcard URL Matching

Extract specific fields using a wildcard to match dynamic URLs:

```json theme={null}
{
  "id": "abc123",
  "name": "Extract order status",
  "action": "EXTRACT_NETWORK",
  "parameters": {
    "url": "/api/v1/orders/*",
    "extract_data_model": {
      "type": "object",
      "properties": {
        "order_id": {
          "type": "string",
          "path": "$.id",
          "selected": true
        },
        "status": {
          "type": "string",
          "path": "$.order.status",
          "selected": true
        }
      }
    }
  }
}
```

Common path expressions:

* `$` — root object
* `$.field` or `field` — direct field access
* `$.parent.child` or `parent.child` — nested field
* `$[0]` — first array element

### Wait for Element First

Wait for a specific element to appear before extracting network data:

```json theme={null}
{
  "id": "abc123",
  "name": "Extract dashboard data",
  "action": "EXTRACT_NETWORK",
  "parameters": {
    "url": "/api/dashboard/data",
    "selector": "//*[@data-loaded='true']",
    "wait_time": 20000,
    "extract_data_model": {
      "type": "object",
      "properties": {
        "dashboard_data": {
          "type": "string",
          "path": "$",
          "selected": true
        }
      }
    }
  }
}
```

## Notes

* The node automatically trims whitespace from URL patterns
* When multiple requests match, the most recent match is used for extraction
* Supported response types: JSON, HTML, and XML. Non-JSON responses are returned as raw strings.
* Network traffic classified as noise (e.g. analytics, tracking pixels, static assets, health checks) is filtered out from extraction candidates by default. If the request you're trying to extract doesn't appear, toggle **Filter noise** off in the Network Traffic panel to see all captured requests.
