Skip to content

Lesson 8: Module 08: Production Patterns

Ship agents that are reliable, governed, cost-aware, and self-correcting.

Learning Objectives

  • Define goals (explicit success criteria) with achieve, metric, constraint, and example
  • Handle errors with on_error: chains and error classification
  • Add reflection steps (quality checks that evaluate output before returning) for quality evaluation
  • Manage cost with model selection, iteration limits, and token budgets

Complexity Ladder: Applies to all levels — production patterns make any machine reliable and governed.

The Concept: From Demo to Production

Building an agent that works in a demo is like cooking a meal at home. Production is running a restaurant — you need health codes (governance), quality standards (goals), backup plans when things go wrong (error handling), and cost management (budget controls).

Building an agent that works in demos is different from building one that works in production. Production agents need:

  • Goals — explicit success criteria that are measurable
  • Error handling — graceful recovery, not silent failures
  • Reflection — quality checks before returning results
  • Cost awareness — agents get expensive fast without guardrails
Error Occurs
┌──────────┐ ┌──────────┐ ┌──────────┐
│ RETRY │────►│ ADJUST │────►│ ESCALATE │
│ │ │ │ │ │
│ Same │ │ Different│ │ Give up │
│ action │ │ approach │ │ safely │
│ │ │ │ │ │
│ "API is │ │ "Query │ │ "No │
│ slow" │ │ was │ │ permis- │
│ │ │ wrong" │ │ sion" │
└──────────┘ └──────────┘ └──────────┘
Transient Fixable Unrecoverable
failure issue problem

When something goes wrong, you don’t just crash. You classify the error and respond appropriately: retry transient failures, adjust your approach for fixable issues, and escalate gracefully when recovery is impossible.

This module covers all four production concerns.

In this module:

  1. Goals — explicit success criteria
  2. Error Handling — retry, adjust, escalate
  3. Reflection — quality evaluation before returning
  4. Cost Awareness — model selection and token budgets
  5. Loop Detection — preventing repetitive behavior

Start With Koda

Koda requires a free account. Sign in or create an account to use Koda exercises throughout this course. If you’re not signed in yet, read on; the exercises will be here when you’re ready.

Before diving into the syntax details, try adding production patterns to an existing machine with Koda:

Ask Koda:

“Take the file explorer agent from the previous module and add: a goal block with accuracy and latency metrics, error handling for when file reads fail, and a reflection step that checks whether the answer actually addresses the original question.”

Check that Koda adds all three elements without breaking the ReAct loop. The goal (explicit success criteria) block should have achieve, at least one metric (a measurable quality indicator), and a constraint (a rule that must or should be followed). Error handling should use on_error: at the flow level. The reflection (a quality check step that evaluates output before returning) step should evaluate after the agent responds. Then continue reading to understand the patterns Koda used.

Goals: Defining Success

The goal do block makes success criteria explicit and measurable:

achieves
goal "Reliably answer questions with cited sources"
succeeds when ">= 0.95 accuracy against ground truth"
succeeds when "p95 response time <= 3000ms"
never "state facts not present in retrieved context"
for example
given { question: "What is the return policy?" }
expect { answer: "30-day return policy...", grounded: true }
optimizes
metrics
accuracy
type: quality
target: ">= 0.95"
measurement: "Percentage of correct answers verified against ground truth"
grounding
type: quality
target: "1.0"
measurement: "All factual claims have citations"
latency
type: latency
target: "<= 3000ms"
measurement: "p95 response time"
ensures
guards
no_hallucination
type: hard
on_violation: "block"
description: "Never state facts not present in retrieved context"
cost_budget
type: soft
description: "Keep average cost under $0.05 per query"

Goal Components

ComponentPurposeExample
achieveOne-sentence success definition”Reliably answer questions with cited sources”
metricMeasurable quality indicatorAccuracy >= 95%, latency <= 3s
constraintHard or soft rule”Never hallucinate”, “Stay under budget”
exampleInput/output pair for testingQuestion -> expected answer

Hard constraints (type: :hard) must never be violated — the runtime fails if they are broken. Soft constraints (type: :soft) are preferences that can bend under pressure.

Goals serve two purposes: they guide evaluation and they provide context for the LLM. Including goals in your machine makes intentions explicit to both humans reviewing the code and AI tools like Koda.

Error Handling

Chain-Level Error Handling

Errors are handled at the flow level with on_error::

implements
flow main, on_error: flow(handle_error)
ask api_call, from: "@mashin/actions/http/get"
url: input.api_url
flow handle_error
// Use AI to classify the error into one of three categories
ask classify_error, using: "anthropic:claude-haiku-4"
temperature: 0.1
with task """
An error occurred: ${error()}
Classify this error:
- RETRY: Transient issue, try again
- ADJUST: Need different parameters
- ESCALATE: Cannot recover automatically
"""
returns
classification as string, choices: ["RETRY", "ADJUST", "ESCALATE"]
message as string
adjustment as map
// Route to the appropriate recovery strategy
match step(classify_error, :classification)
case_of "RETRY"
compute retry_action
"""
run flow(:main)
"""
case_of "ADJUST"
ask adjusted_call, from: "@mashin/actions/http/get"
url: step(classify_error, :adjustment)[:url]
otherwise
compute escalate
"""
%{
error: true,
message: step(:classify_error, :message),
needs_human: true
}
"""

The Three Error Levels

LevelWhenResponse
RetryTransient failure (timeout, rate limit)Try again with backoff
AdjustWrong parameters or approachModify and retry
EscalateUnrecoverableReturn error, notify, request human

Using an LLM to classify errors is powerful — it can distinguish “the API is down” (retry) from “the query is malformed” (adjust) from “we don’t have permission” (escalate).

Retry with Exponential Backoff

Exponential backoff (waiting progressively longer between retries: 500ms, 1s, 2s, 4s…) prevents hammering a struggling service:

implements
flow main
try
retry max: 3, backoff: exponential, base_delay: 500
ask api_call, from: "@mashin/actions/http/get"
url: input.api_url
catch
compute fallback
{data: {error: "API unavailable"}, source: "fallback"}

Reflection: Quality Checks

Add a reflection (a quality check step that evaluates output before returning) step after the main task to evaluate quality before returning:

implements
// Step 1: Generate the initial response
ask generate, using: "anthropic:claude-sonnet-4"
with task "Write a response to: ${input.question}"
returns
response as string
// Step 2: Reflection, evaluate the response before returning it
ask evaluate, using: "anthropic:claude-haiku-4"
temperature: 0.2
with role "You are a critical reviewer. Evaluate responses for accuracy and completeness."
with task """
Original question: ${input.question}
Generated response: ${steps.generate.response}
Evaluate:
1. Does this answer the question?
2. Are there factual errors?
3. Is anything missing?
"""
returns
passes as boolean
issues as list
suggestions as list
// Step 3: If reflection fails, regenerate with feedback
if !step(evaluate, :passes)
ask regenerate, using: "anthropic:claude-sonnet-4"
with task """
Your previous response had issues: ${steps.evaluate.issues}
Suggestions: ${steps.evaluate.suggestions}
Rewrite your response to: ${input.question}
"""
returns
response as string

Anticipatory Reflection (Devil’s Advocate)

Challenge a plan before executing it:

// Step 1: Generate an initial plan
ask generate_plan, using: "anthropic:claude-sonnet-4"
with task "Create a plan to: ${input.goal}"
returns
steps as list
// Step 2: Challenge the plan
ask challenge_plan, using: "anthropic:claude-sonnet-4"
with role "You are a critical reviewer. Find weaknesses in plans."
with task """
Proposed plan: ${steps.generate_plan.steps}
What could go wrong? What edge cases aren't handled?
"""
returns
risks as list
missing_cases as list
// Step 3: Refine the plan incorporating the critique
ask refine_plan, using: "anthropic:claude-sonnet-4"
with task """
Original plan: ${steps.generate_plan.steps}
Identified risks: ${steps.challenge_plan.risks}
Create an improved plan that addresses these concerns.
"""

Cost Awareness

Agent costs grow quadratically — each iteration processes all previous context plus new data. A 10-iteration agent with a large context window can cost 10-50x a single inference call.

Cost Control Strategies

1. Right-size models per step:

// Fast model for routing (cheap)
ask route, using: "anthropic:claude-haiku-4" // $0.25/M tokens
with task "Route this request: ${input.request}"
// Balanced model for synthesis (moderate)
ask synthesize, using: "anthropic:claude-sonnet-4" // $3/M tokens
with task "Synthesize these findings: ${steps.research.findings}"

Don’t use Sonnet for classification or Opus for extraction. Match the model to the task.

2. Set iteration limits:

accepts
max_iterations as integer, default: 8 // Hard cap on loop iterations
implements
compute check_limit
"""
if state(:iteration) > input(:max_iterations) do
%{state: %{final_response: "Reached iteration limit"}, exceeded: true}
else
%{exceeded: false}
end
"""

3. Compact tool results:

Don’t feed entire web pages into context. Summarize or truncate tool results before the next reasoning round to prevent context overflow (too much data in the AI prompt):

compute compact_result
"""
result = step(:tool_exec)
compact = String.slice(inspect(result), 0, 2000)
%{compact_result: compact}
"""

4. Fast filter, then detailed analysis:

// Cheap model decides if detailed analysis is worth the cost
ask quick_filter, using: "anthropic:claude-haiku-4"
with task "Is this worth detailed analysis? Yes/no: ${input.text}"
returns
worth_analyzing as boolean
// Only run the expensive model when the cheap filter says yes
if step(quick_filter, :worth_analyzing)
ask detailed, using: "anthropic:claude-sonnet-4"
with task "Provide detailed analysis: ${input.text}"

Loop Detection

Loop detection (checking if the agent is repeating the same action) prevents agents from wasting tokens on repetitive behavior:

implements
state
action_history: list, default: [] // Track what the agent has done
compute check_loop
"""
current_action = step(:reason, :action)
recent = state(:action_history) |> Enum.take(3)
is_loop = current_action in recent
%{is_loop: is_loop}
"""
// If a loop is detected, force the agent to try something different
if step(check_loop, :is_loop)
ask break_loop, using: "anthropic:claude-sonnet-4"
with task """
You've been repeating the same action: ${steps.reason.action}
Recent actions: ${state.action_history}
Try a completely different approach.
"""

Common Failure Modes

FailureDescriptionMitigation
HallucinationThe AI making up facts not in the provided contextRAG (Retrieval-Augmented Generation — fetching real data to include in the prompt) grounding, low temperature, citations
Action loopsRepeating same actionLoop detection, max iterations
Context overflowToo much data in the AI promptSummarization, selective retrieval
Cascading errorsError propagates through stepsError boundaries, on_error: chains
Goal driftLosing sight of objectiveExplicit goal checks, reflection steps

Pre-Deployment Checklist

Before deploying any machine:

  • Workflow vs Agent? Is the task predictable enough for a workflow?
  • Perception complete? Are all inputs gathered and validated?
  • Reasoning decomposed? Is each LLM step focused on ONE task?
  • Memory grounded? Are responses backed by retrieved knowledge (grounding — basing AI responses on real retrieved data instead of training knowledge)?
  • Execution resilient? Are there error handlers and retries?
  • Reflection added? Are there quality checks on outputs?
  • Loops prevented? Is there max iteration / loop detection?
  • Goals defined? Are success criteria explicit and measurable?
  • Cost budgeted? Are models right-sized and iterations limited?

Key Syntax

// Goal block
achieves
goal "success definition"
succeeds when "measurable threshold"
never "constraint that must hold"
for example
given { ... }
expect { ... }
// Error handling
implements
flow name, on_error: flow(handler)
...
// Retry with exponential backoff
try
retry max: 3, backoff: exponential
...
catch
... // Fallback when all retries exhausted

Course Complete

You’ve now covered the full spectrum — from understanding what agents are (Module 01) to shipping them in production (Module 08). The key takeaways:

  1. Start simple. Most tasks don’t need agents. Use the Complexity Ladder.
  2. ask + tools = agent. No special framework needed.
  3. State and memory give agents working memory and long-term knowledge.
  4. Compose, don’t complicate. Small specialist machines beat one giant agent.
  5. Govern everything. Goals, error handling, reflection, cost limits.

For next steps, explore the Golden Examples for more patterns, and try building a real agent for your own use case with Koda.