Skip to main content
When a run fails, the Maintenance Agent wakes up, labels the failure, and triggers the right recovery path - notify, auto‑repair, or retry. You can change the recovery settings for each workflow in our platform by navigating to the workflow and then clicking on “Error Handling”.

Error code generation

In case the error is a User Error and there’s no matching error code, maintenance agent will suggest a new error code. You can approve these error codes in our platform. Once approved, the error code is returned in future runs in the execution.failed webhook.

Error‑classification matrix

Every error is classified into a category and, where applicable, a sub‑type. Both are returned in the execution.failed webhook and the Retrieve Run Results response as llm_error_category and llm_error_sub_type.
CategorySub‑typesDescription
AUTHENTICATION_ERRORINVALID_CREDENTIALS, MFA_CODE_REJECTED, TFA_SETUP_REQUIRED, ACCOUNT_LOCKED, SILENT_LOGIN_FAILURE, ACCESS_DENIEDWrong or expired credentials, failed 2FA, captcha gate, or account locked
INCORRECT_FORM_INPUTSNO_MATCHING_OPTION, EMPTY_SELECTOR_VALUE, FIELD_VALIDATION_ERROR, UNRESOLVED_TEMPLATE, PAGINATION_MISSING, MULTIPLE_MATCHING_RESULTS_FOUNDProvided value is empty, invalid, fails validation, or matches more than one option
PASSWORD_UPDATE_REQUIREDSite forces a password change before further access
PAGE_NOT_FOUNDURL returns 404; usually a bad ID or slug in the input
UNEXPECTED_UI_STATEDISMISSIBLE, DECISION_REQUIREDUnexpected popup or layout change
XPATH_INCORRECTSELECTOR_NO_MATCH, AMBIGUOUS_MATCHES, SELECTOR_MISMATCH, TABLE_SELECTOR_MISMATCHXPath selector matches zero or multiple elements
UPSTREAM_ERRORPREREQUISITE_NOT_MET, TIMING_RACE, TRANSIENT_SITE_ISSUE, SILENT_UPSTREAM_FAILURESite passed login but a subsequent step failed due to upstream data, timing, or a transient site‑side issue
SERVICE_UNAVAILABLEAPPLICATION_ERROR, ELEMENT_TIMEOUT, HTTP_ERROR, MAINTENANCE_DOWNTIMEUpstream system is down, returning errors, or under maintenance
PAGE_STILL_LOADINGSPINNER_VISIBLE, CONTENT_AREA_EMPTY, NAVIGATION_IN_PROGRESS, COMPONENT_LOADING, PARTIAL_PAGE_LOAD, LAZY_CONTENT_PENDINGPage hasn’t finished loading; see Page Loading Recovery
ACCOUNT_LOGGED_OUTSESSION_INVALIDATED, REDIRECTED_TO_LOGIN, SESSION_EXPIREDThe website logged out the account mid‑workflow
Each category may also return a sub‑type of OTHER when the error fits the category but does not match a more specific sub‑type.
Legacy category names UNEXPECTED_POPUP, ADDITIONAL_USER_INPUT_REQUIRED, and ACTION_BLOCKED_BY_PLATFORM are automatically normalized to their current equivalents. You do not need to update existing error code mappings.

Recovery playbook

CategoryRecovery action
AUTHENTICATION_ERRORNotification sent. If sub‑type is TFA_SETUP_REQUIRED and TFA Setup Recovery is enabled, the agent handles 2FA enrollment automatically.
INCORRECT_FORM_INPUTSIf Incorrect Form Input Recovery is enabled, sends execution.input_required webhook and waits for corrected values. Otherwise, notification sent.
PASSWORD_UPDATE_REQUIREDIf Password Update Recovery is enabled, the agent rotates the password, updates the vault, and resumes. Otherwise, notification sent.
PAGE_NOT_FOUNDNotification sent.
UNEXPECTED_UI_STATEPopup handling — auto‑dismiss or notification.
XPATH_INCORRECTIf XPath Recovery is enabled, the agent auto‑repairs the selector and resumes.
UPSTREAM_ERRORNotification sent.
SERVICE_UNAVAILABLEIf Service Unavailable Recovery is enabled, schedules exponential back‑off retries.
PAGE_STILL_LOADINGAutomatic Page Loading Recovery — delay injection, retrace, or requeue.
ACCOUNT_LOGGED_OUTNotification sent. The run may be retried if Service Unavailable Recovery is enabled.
Note that if the workflow is retried, its status will be execution.requeued until the run is started again. An execution.requeued webhook is sent with the retry attempt number and scheduled execution time.

Incorrect Form Input Recovery

When a workflow fails because of wrong input data (e.g. a bad patient ID), the Maintenance Agent can request corrected data and resume the workflow without restarting. How it works
  1. The error is classified as INCORRECT_FORM_INPUTS (or a matched error code has the “Request New Input” action)
  2. An execution.input_required webhook is sent with the session ID, current input variables, and a screenshot
  3. The system waits for your AI agent or automation to call POST /run/{session_id}/new_input_variables with corrected values
  4. Once received, the workflow resumes from the node that uses the corrected variable
  5. If no response arrives within the timeout, the session fails as normal
Configuration
  • Enable “Incorrect Form Input Recovery” in Error Handling > Maintenance Agent Settings
  • Or set a specific error code’s action to “Request New Input” in Error Handling > Error Codes
  • The wait timeout is configurable per workspace via input_required_timeout_seconds (default 15s, minimum 5s, maximum 300s). The wait is interruptable via the Interrupt Run endpoint.
See Submit New Input Variables for the full API reference.

Password Update Recovery

When the target site forces a password change (expired credentials, mandatory rotation), the Maintenance Agent can automatically generate a new password, complete the form, update the vault, and resume the workflow. How it works
  1. The error is classified as PASSWORD_UPDATE_REQUIRED
  2. The agent analyzes the password change form via screenshot to determine the form type
  3. If the form is dismissable (optional password change prompt), the agent clicks the dismiss button and continues
  4. If the form requires a password change, the agent retrieves the current credential from the vault, generates a new strong password, and fills in the form fields
  5. The vault credential is updated with the new password
  6. An execution.password_updated webhook is sent
  7. The workflow resumes from the next step
Configuration Enable “Password Update Recovery” in Error Handling > Maintenance Agent Settings, or set enable_password_update_recovery to true via the Workflow API.
Password Update Recovery only works for workflows with a single vault credential. Workflows with multiple credentials skip automatic recovery and send a notification instead.

TFA Setup Recovery

When the target site requires two‑factor authentication enrollment during a run (e.g., a mandatory authenticator setup screen), the Maintenance Agent can complete the setup automatically and store the 2FA secret in the vault for future runs. Supported scenarios
ScenarioRecovery action
Authenticator app setupReads the TOTP secret from the QR code or setup page, enters the current code, and saves the secret to the vault
SMS verificationEnters the verification code using the CC 2FA Proxy
Email verificationEnters the verification code using the CC 2FA Proxy
Dismissable promptClicks the skip/dismiss button if 2FA setup is optional
Configuration Enable “TFA Setup Recovery” in Error Handling > Maintenance Agent Settings, or set enable_tfa_setup_recovery to true via the Workflow API.

Node Description Enrichment

The Maintenance Agent classifies errors more accurately when it understands what each workflow step does. Workflows without descriptions are 5x more likely to produce unclassified (NOT_CONFIDENT) errors, reducing the agent’s ability to trigger the correct recovery action. Node Description Enrichment automatically generates diagnostic descriptions for your workflow steps and an overall workflow summary. It runs on workflows with missing or very short descriptions that have high error rates and frequent unclassified errors, closing this gap without any manual effort. How it works
  1. The system periodically analyzes screenshots from successful runs to understand what each step does
  2. A 2–3 sentence description is generated per node — covering what page it operates on, what UI element it targets, and what success looks like
  3. A workflow-level summary is generated from all enriched node descriptions
  4. Descriptions are written directly into your workflow and used by the Maintenance Agent for all future error classification
Configuration Enable “Node Description Enrichment” in Error Handling > Maintenance Agent Settings. Once enabled, any node with no description or a very short description (under 30 characters) will be enriched on the next enrichment cycle. The same applies to the overall workflow description.

Service Unavailable Recovery

When a workflow encounters an external service unavailability (503 errors, timeouts, maintenance pages), the Maintenance Agent can automatically retry the workflow using exponential backoff. Retry Schedule The delay between retries follows the formula: 10 minutes × 2^n (where n is the retry attempt number, starting at 0), with ±20% jitter to prevent thundering herd issues.
RetryApproximate Delay
1~10 minutes
2~20 minutes
3~40 minutes
4~1.3 hours
5~2.7 hours
6~5.3 hours
7~10.7 hours
8~21 hours
9~1.8 days
10~3.5 days
Configuration You can configure the maximum number of retry attempts (0-10) per workflow in the Error Handling settings. Navigate to your workflow → Error Handling → Maintenance Agent Settings → “Maximum Error Recovery Attempts”. Backstop: Webhook Notification When all retry attempts are exhausted and the workflow still fails, an execution.failed webhook is sent to your registered webhook URL. This allows you to:
  • Alert your team via PagerDuty, Slack, or other alerting systems
  • Queue the failure for manual review
  • Trigger alternative fallback logic on your end

Page Loading Recovery

When a workflow action fails because the page hasn’t finished loading — spinner overlays, skeleton placeholders, lazy-loaded content not yet rendered — the Maintenance Agent detects the issue, waits, retries, and if successful, permanently injects a delay into the workflow so future runs never hit the same problem.
Page Loading Recovery consolidates the former ACTION_PERFORMED_TOO_EARLY category into PAGE_STILL_LOADING with expanded sub-types. Any legacy references to ACTION_PERFORMED_TOO_EARLY are automatically normalized.
Sub-types The Maintenance Agent classifies page loading failures into specific sub-types to determine the appropriate wait duration.
Sub-TypeDescriptionWait Duration
SPINNER_VISIBLEFull-page or section spinner / loading indicator is visible15s
CONTENT_AREA_EMPTYMain content area is blank during load15s
NAVIGATION_IN_PROGRESSPage transition still happening15s
COMPONENT_LOADINGIndividual component still rendering10s
PARTIAL_PAGE_LOADPage partially rendered, some sections not ready10s
LAZY_CONTENT_PENDINGLazy-loaded content (dropdowns, search results) not populated10s
Sub-types where the page is partially ready (COMPONENT_LOADING, PARTIAL_PAGE_LOAD, LAZY_CONTENT_PENDING) receive a shorter 10-second delay. Sub-types indicating the page is still broadly loading receive 15 seconds. Recovery pipeline Recovery proceeds through three tiers. Each tier is attempted only once per node per session to prevent infinite loops. Tier 1 — Delay and retry: When PAGE_STILL_LOADING is detected, a temporary Delay node is injected before the failing node. The workflow resumes from the delay node and retries the original action. If the retry succeeds, the delay is permanently injected into the workflow. Tier 2 — Retrace to earlier node: If the same node fails again after the Tier 1 delay, an LLM analyzes screenshots to identify an earlier node the workflow can safely resume from. A delay is injected before that earlier node and the workflow resumes from there. Tier 3 — Requeue: If both tiers fail, the session falls back to a full requeue (session restart), governed by the workflow’s Service Unavailable Recovery settings and maximum retry attempts. Permanent workflow update After a successful Tier 1 recovery, the system permanently increases the wait_time on the failing node so future runs give the page more time to load before timing out:
  1. The failing node completes successfully after the injected delay
  2. The system checks that the node’s wait_time hasn’t already been bumped
  3. The node’s wait_time is increased from the default 15s to 15s + the recovery delay (e.g. 25s or 30s depending on sub-type)
  4. A new workflow version is created with a descriptive note, e.g.: Auto-increased wait_time to 30s on "Click New Patient" (page loading recovery)
Unlike a static delay node, wait_time is adaptive — the node polls for the target element and proceeds as soon as it appears. If the page loads in 500ms, the node finishes in 500ms. The increased timeout is only a ceiling, not a fixed sleep.
The wait_time bump is safe and non-destructive. Each update creates a new workflow version, so you can always review or revert from the workflow version history.
Configuration Page Loading Recovery is enabled automatically for all workflows. No additional configuration is required. The recovery respects your existing error handling settings — Maximum Error Recovery Attempts controls how many times Tier 3 (requeue) can retry, and Service Unavailable Recovery must be enabled for Tier 3 fallback to activate.