Skip to content

Error Handling

Steps can fail. An HTTP request times out, an LLM returns an error, a permission is denied, a called machine throws an exception. By default, a failure halts the machine and returns an error. The on failure block lets you catch failures and recover: provide a fallback, log the error, notify someone, or retry with a different strategy.

on failure basics

Place on failure at the end of a flow. If any step in that flow fails, execution jumps to the on failure block:

implements
ask fetch_weather, from: "@mashin/actions/http/get"
url: "https://api.weather.com/current?city=${input.city}"
assuming
body: {temp: 72, conditions: "sunny"}
status: 200
compute format
{temperature: steps.fetch_weather.body.temp, source: "live"}
on failure
compute fallback
{
temperature: null,
source: "unavailable",
error: error.message
}

If the HTTP request fails (timeout, 500 error, DNS failure), the on failure block runs. The machine returns the fallback response instead of an error.

Error context

Inside on failure, the error is available through:

ReferenceDescription
error.messageHuman-readable error description
error.stepName of the step that failed
error.typeError category: "timeout", "permission_denied", "runtime"
error.detailsAdditional error-specific data
on failure
compute error_report
{
failed_step: error.step,
error_type: error.type,
message: error.message,
status: "failed"
}

Fallback to a different model

A common pattern: try a primary model, fall back to a smaller one if it fails.

machine resilient_classifier
accepts
text as text, is required
responds with
category as text
model_used as text
ensures
permissions
allowed to
llm_call
implements
flow classify
ask primary, using: "anthropic:claude-sonnet-4-6"
with task "Classify this text.\n\nText: ${input.text}"
returns
category as text
assuming
category: "general"
compute result
{category: steps.primary.category, model_used: "primary"}
on failure
ask fallback, using: "anthropic:claude-haiku-4-5"
with task "Classify this text into a category.\n\nText: ${input.text}"
returns
category as text
assuming
category: "general"
compute result
{category: steps.fallback.category, model_used: "fallback"}

Notify on failure

Fire off an alert when something goes wrong, then return a graceful error:

on failure
launch send_alert
machine: "@mashin/actions/notifications/send"
channel: "slack"
message: "Pipeline failed at " + error.step + ": " + error.message
compute error_response
{status: "failed", step: error.step, message: error.message}

launch is fire-and-forget: the alert is sent asynchronously and does not block the error response.

Per-flow error handling

In multi-flow machines, each flow has its own on failure:

implements
flows
flow extract
ask fetch, from: "@mashin/actions/http/get"
url: input.url
assuming
body: {}
status: 200
on failure
compute extract_error
{stage: "extract", error: error.message}
flow transform
compute process
{data: steps.fetch.body}
on failure
compute transform_error
{stage: "transform", error: error.message}

A failure in extract does not trigger transform’s error handler. Each flow manages its own recovery.

Runtime retry configuration

For automatic retry of transient failures, use runtime configuration:

implements
runtime
max_retries: 3
timeout: 30000

This retries any failed step up to 3 times before triggering on failure. Use this for transient errors (network timeouts, rate limits) rather than permanent failures (invalid input, permission denied).

What gets recorded

Both the failure and the recovery are recorded in the behavioral ledger:

  1. The original step failure (step name, error type, message)
  2. Each step in the on failure block (with normal step recording)
  3. The final output of the recovery path

This makes error-and-recovery sequences fully auditable. You can answer: “What failed? What did the machine do about it?”

If the on failure block itself fails, the machine halts with the secondary error. There is no nested error handling.

Try it

Build a machine that calls an HTTP API. Add an on failure block that returns cached data when the API is unavailable. Use error.type to distinguish between timeouts and other failures, returning different messages for each.

Next steps