Documentation Index
Fetch the complete documentation index at: https://docs.cloudcruise.com/llms.txt
Use this file to discover all available pages before exploring further.
When a run fails, the Maintenance Agent wakes up, labels the failure, and triggers the right recovery path - notify, auto‑repair, or retry.
You can change the recovery settings for each workflow in our platform by navigating to the workflow and then clicking on “Error Handling”.
Error code generation
In case the error is a User Error and there’s no matching error code, maintenance agent will suggest a new error code. You can approve these error codes in our platform.
Once approved, the error code is returned in future runs in the execution.failed webhook.
Error‑classification matrix
| Category | Sub‑category | Description |
|---|
User Error | PAGE_NOT_FOUND | URL returns 404; usually bad ID/slug |
| AUTHENTICATION_ERROR | Wrong or expired credentials, unexpected 2FA, captcha, password reset |
| INCORRECT_FORM_INPUTS | Provided value empty, invalid, or fails validation |
| PASSWORD_UPDATE_REQUIRED | Site forces password change before further access |
| ADDITIONAL_USER_INPUT_REQUIRED | Unexpected gating modal that truly blocks progress |
| MULTIPLE_MATCHING_RESULTS_FOUND | Ambiguous search results require human or AI disambiguation |
| ACTION_BLOCKED_BY_PLATFORM | Platform rejects duplicate or forbidden action |
Workflow Error | UNEXPECTED_UI_STATE | Unexpected popup or layout change (DISMISSIBLE / NON_DISMISSIBLE) |
| XPATH_INCORRECT | Selector matches 0 or >1 elements |
External Error | SERVICE_UNAVAILABLE | Upstream system down or non‑responsive |
| PAGE_STILL_LOADING | Page hasn’t finished loading; see Page Loading Recovery |
| ACCOUNT_LOGGED_OUT | The website logged out the account unexpectedly |
Recovery playbook
| Classification | Action |
|---|
| User Error | Surface a dashboard alert + notification; ask the user to correct data or credentials. |
| Workflow Error | Auto‑patch the graph: update selectors or insert waits. |
| External Error | Schedule exponential back‑off retries; no graph change. |
Note, that if the workflow is retried, its status will be execution.requeued until the run is started again.
When a workflow fails because of wrong input data (e.g. a bad patient ID), the Maintenance Agent can request corrected data and resume the workflow without restarting.
How it works
- The error is classified as
INCORRECT_FORM_INPUTS (or a matched error code has the “Request New Input” action)
- An
execution.input_required webhook is sent with the session ID, current input variables, and a screenshot
- The system waits for your AI agent or automation to call
POST /run/{session_id}/new_input_variables with corrected values
- Once received, the workflow resumes from the node that uses the corrected variable
- If no response arrives within the timeout, the session fails as normal
Configuration
- Enable “Incorrect Form Input Recovery” in Error Handling > Maintenance Agent Settings
- Or set a specific error code’s action to “Request New Input” in Error Handling > Error Codes
- The wait timeout is configurable per workspace via
input_required_timeout_seconds (default 15s, minimum 5s, maximum 300s). The wait is interruptable via the Interrupt Run endpoint.
See Submit New Input Variables for the full API reference.
Node Description Enrichment
The Maintenance Agent classifies errors more accurately when it understands what each workflow step does. Workflows without descriptions are 5x more likely to produce unclassified (NOT_CONFIDENT) errors, reducing the agent’s ability to trigger the correct recovery action.
Node Description Enrichment automatically generates diagnostic descriptions for your workflow steps and an overall workflow summary. It runs on workflows with missing or very short descriptions that have high error rates and frequent unclassified errors, closing this gap without any manual effort.
How it works
- The system periodically analyzes screenshots from successful runs to understand what each step does
- A 2–3 sentence description is generated per node — covering what page it operates on, what UI element it targets, and what success looks like
- A workflow-level summary is generated from all enriched node descriptions
- Descriptions are written directly into your workflow and used by the Maintenance Agent for all future error classification
Configuration
Enable “Node Description Enrichment” in Error Handling > Maintenance Agent Settings. Once enabled, any node with no description or a very short description (under 30 characters) will be enriched on the next enrichment cycle. The same applies to the overall workflow description.
Service Unavailable Recovery
When a workflow encounters an external service unavailability (503 errors, timeouts, maintenance pages), the Maintenance Agent can automatically retry the workflow using exponential backoff.
Retry Schedule
The delay between retries follows the formula: 10 minutes × 2^n (where n is the retry attempt number, starting at 0), with ±20% jitter to prevent thundering herd issues.
| Retry | Approximate Delay |
|---|
| 1 | ~10 minutes |
| 2 | ~20 minutes |
| 3 | ~40 minutes |
| 4 | ~1.3 hours |
| 5 | ~2.7 hours |
| 6 | ~5.3 hours |
| 7 | ~10.7 hours |
| 8 | ~21 hours |
| 9 | ~1.8 days |
| 10 | ~3.5 days |
Configuration
You can configure the maximum number of retry attempts (0-10) per workflow in the Error Handling settings. Navigate to your workflow → Error Handling → Maintenance Agent Settings → “Maximum Error Recovery Attempts”.
Backstop: Webhook Notification
When all retry attempts are exhausted and the workflow still fails, an execution.failed webhook is sent to your registered webhook URL. This allows you to:
- Alert your team via PagerDuty, Slack, or other alerting systems
- Queue the failure for manual review
- Trigger alternative fallback logic on your end
Page Loading Recovery
When a workflow action fails because the page hasn’t finished loading — spinner overlays, skeleton placeholders, lazy-loaded content not yet rendered — the Maintenance Agent detects the issue, waits, retries, and if successful, permanently injects a delay into the workflow so future runs never hit the same problem.
Page Loading Recovery consolidates the former ACTION_PERFORMED_TOO_EARLY category into PAGE_STILL_LOADING with expanded sub-types. Any legacy references to ACTION_PERFORMED_TOO_EARLY are automatically normalized.
Sub-types
The Maintenance Agent classifies page loading failures into specific sub-types to determine the appropriate wait duration.
| Sub-Type | Description | Wait Duration |
|---|
SPINNER_VISIBLE | Full-page or section spinner / loading indicator is visible | 15s |
CONTENT_AREA_EMPTY | Main content area is blank during load | 15s |
NAVIGATION_IN_PROGRESS | Page transition still happening | 15s |
COMPONENT_LOADING | Individual component still rendering | 10s |
PARTIAL_PAGE_LOAD | Page partially rendered, some sections not ready | 10s |
LAZY_CONTENT_PENDING | Lazy-loaded content (dropdowns, search results) not populated | 10s |
Sub-types where the page is partially ready (COMPONENT_LOADING, PARTIAL_PAGE_LOAD, LAZY_CONTENT_PENDING) receive a shorter 10-second delay. Sub-types indicating the page is still broadly loading receive 15 seconds.
Recovery pipeline
Recovery proceeds through three tiers. Each tier is attempted only once per node per session to prevent infinite loops.
Tier 1 — Delay and retry: When PAGE_STILL_LOADING is detected, a temporary Delay node is injected before the failing node. The workflow resumes from the delay node and retries the original action. If the retry succeeds, the delay is permanently injected into the workflow.
Tier 2 — Retrace to earlier node: If the same node fails again after the Tier 1 delay, an LLM analyzes screenshots to identify an earlier node the workflow can safely resume from. A delay is injected before that earlier node and the workflow resumes from there.
Tier 3 — Requeue: If both tiers fail, the session falls back to a full requeue (session restart), governed by the workflow’s Service Unavailable Recovery settings and maximum retry attempts.
Permanent workflow update
After a successful Tier 1 recovery, the system permanently increases the wait_time on the failing node so future runs give the page more time to load before timing out:
- The failing node completes successfully after the injected delay
- The system checks that the node’s
wait_time hasn’t already been bumped
- The node’s
wait_time is increased from the default 15s to 15s + the recovery delay (e.g. 25s or 30s depending on sub-type)
- A new workflow version is created with a descriptive note, e.g.:
Auto-increased wait_time to 30s on "Click New Patient" (page loading recovery)
Unlike a static delay node, wait_time is adaptive — the node polls for the target element and proceeds as soon as it appears. If the page loads in 500ms, the node finishes in 500ms. The increased timeout is only a ceiling, not a fixed sleep.
The wait_time bump is safe and non-destructive. Each update creates a new workflow version, so you can always review or revert from the workflow version history.
Configuration
Page Loading Recovery is enabled automatically for all workflows. No additional configuration is required. The recovery respects your existing error handling settings — Maximum Error Recovery Attempts controls how many times Tier 3 (requeue) can retry, and Service Unavailable Recovery must be enabled for Tier 3 fallback to activate.