Lesson 8: Module 08: Production Patterns

Ship agents that are reliable, governed, cost-aware, and self-correcting.

Learning Objectives

Define goals (explicit success criteria) with achieve, metric, constraint, and example
Handle errors with on_error: chains and error classification
Add reflection steps (quality checks that evaluate output before returning) for quality evaluation
Manage cost with model selection, iteration limits, and token budgets

Complexity Ladder: Applies to all levels — production patterns make any machine reliable and governed.

The Concept: From Demo to Production

Building an agent that works in a demo is like cooking a meal at home. Production is running a restaurant — you need health codes (governance), quality standards (goals), backup plans when things go wrong (error handling), and cost management (budget controls).

Building an agent that works in demos is different from building one that works in production. Production agents need:

Goals — explicit success criteria that are measurable
Error handling — graceful recovery, not silent failures
Reflection — quality checks before returning results
Cost awareness — agents get expensive fast without guardrails

  Error Occurs
       │
       ▼
  ┌──────────┐     ┌──────────┐     ┌──────────┐
  │  RETRY   │────►│  ADJUST  │────►│ ESCALATE │
  │          │     │          │     │          │
  │ Same     │     │ Different│     │ Give up  │
  │ action   │     │ approach │     │ safely   │
  │          │     │          │     │          │
  │ "API is  │     │ "Query   │     │ "No      │
  │  slow"   │     │  was     │     │  permis- │
  │          │     │  wrong"  │     │  sion"   │
  └──────────┘     └──────────┘     └──────────┘
   Transient        Fixable          Unrecoverable
   failure          issue            problem

When something goes wrong, you don’t just crash. You classify the error and respond appropriately: retry transient failures, adjust your approach for fixable issues, and escalate gracefully when recovery is impossible.

This module covers all four production concerns.

In this module:

Goals — explicit success criteria
Error Handling — retry, adjust, escalate
Reflection — quality evaluation before returning
Cost Awareness — model selection and token budgets
Loop Detection — preventing repetitive behavior

Start With Koda

Koda requires a free account. Sign in or create an account to use Koda exercises throughout this course. If you’re not signed in yet, read on; the exercises will be here when you’re ready.

Before diving into the syntax details, try adding production patterns to an existing machine with Koda:

Ask Koda:

“Take the file explorer agent from the previous module and add: a goal block with accuracy and latency metrics, error handling for when file reads fail, and a reflection step that checks whether the answer actually addresses the original question.”

Check that Koda adds all three elements without breaking the ReAct loop. The goal (explicit success criteria) block should have achieve, at least one metric (a measurable quality indicator), and a constraint (a rule that must or should be followed). Error handling should use on_error: at the flow level. The reflection (a quality check step that evaluates output before returning) step should evaluate after the agent responds. Then continue reading to understand the patterns Koda used.

Goals: Defining Success

The goal do block makes success criteria explicit and measurable:

achieves
  goal "Reliably answer questions with cited sources"
    succeeds when ">= 0.95 accuracy against ground truth"
    succeeds when "p95 response time <= 3000ms"
    never "state facts not present in retrieved context"
    for example
      given { question: "What is the return policy?" }
      expect { answer: "30-day return policy...", grounded: true }

optimizes
  metrics
    accuracy
      type: quality
      target: ">= 0.95"
      measurement: "Percentage of correct answers verified against ground truth"
    grounding
      type: quality
      target: "1.0"
      measurement: "All factual claims have citations"
    latency
      type: latency
      target: "<= 3000ms"
      measurement: "p95 response time"

ensures
  guards
    no_hallucination
      type: hard
      on_violation: "block"
      description: "Never state facts not present in retrieved context"
    cost_budget
      type: soft
      description: "Keep average cost under $0.05 per query"

Goal Components

Component	Purpose	Example
`achieve`	One-sentence success definition	”Reliably answer questions with cited sources”
`metric`	Measurable quality indicator	Accuracy >= 95%, latency <= 3s
`constraint`	Hard or soft rule	”Never hallucinate”, “Stay under budget”
`example`	Input/output pair for testing	Question -> expected answer

Hard constraints (type: :hard) must never be violated — the runtime fails if they are broken. Soft constraints (type: :soft) are preferences that can bend under pressure.

Goals serve two purposes: they guide evaluation and they provide context for the LLM. Including goals in your machine makes intentions explicit to both humans reviewing the code and AI tools like Koda.

Error Handling

Chain-Level Error Handling

Errors are handled at the flow level with on_error::

implements
  flow main, on_error: flow(handle_error)
    ask api_call, from: "@mashin/actions/http/get"
      url: input.api_url

  flow handle_error
    // Use AI to classify the error into one of three categories
    ask classify_error, using: "anthropic:claude-haiku-4"
      temperature: 0.1
      with task """
        An error occurred: ${error()}

        Classify this error:
        - RETRY: Transient issue, try again
        - ADJUST: Need different parameters
        - ESCALATE: Cannot recover automatically
        """
      returns
        classification as string, choices: ["RETRY", "ADJUST", "ESCALATE"]
        message as string
        adjustment as map

    // Route to the appropriate recovery strategy
    match step(classify_error, :classification)
      case_of "RETRY"
        compute retry_action
          """
          run flow(:main)
          """

      case_of "ADJUST"
        ask adjusted_call, from: "@mashin/actions/http/get"
          url: step(classify_error, :adjustment)[:url]

      otherwise
        compute escalate
          """
          %{
            error: true,
            message: step(:classify_error, :message),
            needs_human: true
          }
          """

The Three Error Levels

Level	When	Response
Retry	Transient failure (timeout, rate limit)	Try again with backoff
Adjust	Wrong parameters or approach	Modify and retry
Escalate	Unrecoverable	Return error, notify, request human

Using an LLM to classify errors is powerful — it can distinguish “the API is down” (retry) from “the query is malformed” (adjust) from “we don’t have permission” (escalate).

Retry with Exponential Backoff

Exponential backoff (waiting progressively longer between retries: 500ms, 1s, 2s, 4s…) prevents hammering a struggling service:

implements
  flow main
    try
      retry max: 3, backoff: exponential, base_delay: 500
        ask api_call, from: "@mashin/actions/http/get"
          url: input.api_url
    catch
      compute fallback
        {data: {error: "API unavailable"}, source: "fallback"}

Reflection: Quality Checks

Add a reflection (a quality check step that evaluates output before returning) step after the main task to evaluate quality before returning:

implements
  // Step 1: Generate the initial response
  ask generate, using: "anthropic:claude-sonnet-4"
    with task "Write a response to: ${input.question}"
    returns
      response as string

  // Step 2: Reflection, evaluate the response before returning it
  ask evaluate, using: "anthropic:claude-haiku-4"
    temperature: 0.2
    with role "You are a critical reviewer. Evaluate responses for accuracy and completeness."
    with task """
      Original question: ${input.question}
      Generated response: ${steps.generate.response}

      Evaluate:
      1. Does this answer the question?
      2. Are there factual errors?
      3. Is anything missing?
      """
    returns
      passes as boolean
      issues as list
      suggestions as list

  // Step 3: If reflection fails, regenerate with feedback
  if !step(evaluate, :passes)
    ask regenerate, using: "anthropic:claude-sonnet-4"
      with task """
        Your previous response had issues: ${steps.evaluate.issues}
        Suggestions: ${steps.evaluate.suggestions}

        Rewrite your response to: ${input.question}
        """
      returns
        response as string

Anticipatory Reflection (Devil’s Advocate)

Challenge a plan before executing it:

// Step 1: Generate an initial plan
ask generate_plan, using: "anthropic:claude-sonnet-4"
  with task "Create a plan to: ${input.goal}"
  returns
    steps as list

// Step 2: Challenge the plan
ask challenge_plan, using: "anthropic:claude-sonnet-4"
  with role "You are a critical reviewer. Find weaknesses in plans."
  with task """
    Proposed plan: ${steps.generate_plan.steps}
    What could go wrong? What edge cases aren't handled?
    """
  returns
    risks as list
    missing_cases as list

// Step 3: Refine the plan incorporating the critique
ask refine_plan, using: "anthropic:claude-sonnet-4"
  with task """
    Original plan: ${steps.generate_plan.steps}
    Identified risks: ${steps.challenge_plan.risks}
    Create an improved plan that addresses these concerns.
    """

Cost Awareness

Agent costs grow quadratically — each iteration processes all previous context plus new data. A 10-iteration agent with a large context window can cost 10-50x a single inference call.

Cost Control Strategies

1. Right-size models per step:

// Fast model for routing (cheap)
ask route, using: "anthropic:claude-haiku-4"      // $0.25/M tokens
  with task "Route this request: ${input.request}"

// Balanced model for synthesis (moderate)
ask synthesize, using: "anthropic:claude-sonnet-4"  // $3/M tokens
  with task "Synthesize these findings: ${steps.research.findings}"

Don’t use Sonnet for classification or Opus for extraction. Match the model to the task.

2. Set iteration limits:

accepts
  max_iterations as integer, default: 8    // Hard cap on loop iterations

implements
  compute check_limit
    """
    if state(:iteration) > input(:max_iterations) do
      %{state: %{final_response: "Reached iteration limit"}, exceeded: true}
    else
      %{exceeded: false}
    end
    """

3. Compact tool results:

Don’t feed entire web pages into context. Summarize or truncate tool results before the next reasoning round to prevent context overflow (too much data in the AI prompt):

compute compact_result
  """
  result = step(:tool_exec)
  compact = String.slice(inspect(result), 0, 2000)
  %{compact_result: compact}
  """

4. Fast filter, then detailed analysis:

// Cheap model decides if detailed analysis is worth the cost
ask quick_filter, using: "anthropic:claude-haiku-4"
  with task "Is this worth detailed analysis? Yes/no: ${input.text}"
  returns
    worth_analyzing as boolean

// Only run the expensive model when the cheap filter says yes
if step(quick_filter, :worth_analyzing)
  ask detailed, using: "anthropic:claude-sonnet-4"
    with task "Provide detailed analysis: ${input.text}"

Loop Detection

Loop detection (checking if the agent is repeating the same action) prevents agents from wasting tokens on repetitive behavior:

implements
  state
    action_history: list, default: []    // Track what the agent has done

  compute check_loop
    """
    current_action = step(:reason, :action)
    recent = state(:action_history) |> Enum.take(3)
    is_loop = current_action in recent
    %{is_loop: is_loop}
    """

  // If a loop is detected, force the agent to try something different
  if step(check_loop, :is_loop)
    ask break_loop, using: "anthropic:claude-sonnet-4"
      with task """
        You've been repeating the same action: ${steps.reason.action}
        Recent actions: ${state.action_history}
        Try a completely different approach.
        """

Common Failure Modes

Failure	Description	Mitigation
Hallucination	The AI making up facts not in the provided context	RAG (Retrieval-Augmented Generation — fetching real data to include in the prompt) grounding, low temperature, citations
Action loops	Repeating same action	Loop detection, max iterations
Context overflow	Too much data in the AI prompt	Summarization, selective retrieval
Cascading errors	Error propagates through steps	Error boundaries, `on_error:` chains
Goal drift	Losing sight of objective	Explicit goal checks, reflection steps

Pre-Deployment Checklist

Before deploying any machine:

Workflow vs Agent? Is the task predictable enough for a workflow?
Perception complete? Are all inputs gathered and validated?
Reasoning decomposed? Is each LLM step focused on ONE task?
Memory grounded? Are responses backed by retrieved knowledge (grounding — basing AI responses on real retrieved data instead of training knowledge)?
Execution resilient? Are there error handlers and retries?
Reflection added? Are there quality checks on outputs?
Loops prevented? Is there max iteration / loop detection?
Goals defined? Are success criteria explicit and measurable?
Cost budgeted? Are models right-sized and iterations limited?

Key Syntax

// Goal block
achieves
  goal "success definition"
    succeeds when "measurable threshold"
    never "constraint that must hold"
    for example
      given { ... }
      expect { ... }

// Error handling
implements
  flow name, on_error: flow(handler)
    ...

// Retry with exponential backoff
  try
    retry max: 3, backoff: exponential
      ...
  catch
    ...                                // Fallback when all retries exhausted

Course Complete

You’ve now covered the full spectrum — from understanding what agents are (Module 01) to shipping them in production (Module 08). The key takeaways:

Start simple. Most tasks don’t need agents. Use the Complexity Ladder.
ask + tools = agent. No special framework needed.
State and memory give agents working memory and long-term knowledge.
Compose, don’t complicate. Small specialist machines beat one giant agent.
Govern everything. Goals, error handling, reflection, cost limits.

For next steps, explore the Golden Examples for more patterns, and try building a real agent for your own use case with Koda.