Building Durable AI Agents

AI agents are built on the primitive of LLM and tool-call loops, often with additional processes for data fetching, resource provisioning, or reacting to external events. Workflow DevKit makes your agents production-ready, by turning them into durable, resumable workflows. It transforms your LLM calls, tool executions, and other async operations into retryable, scalable, and observable steps.

Why Durable Agents?

Aside from the usual challenges of getting your long-running tasks to be production-ready, building mature AI agents typically requires solving several additional challenges:

Statefulness: Persisting chat sessions and turning LLM and tool calls into async jobs with workers and queues.
Observability: Using services to collect traces and metrics, and managing them separately from your messages and user history.
Resumability: Resuming streams requires not just storing your messages, but also storing streams, and piping them across services.
Human-in-the-loop: Your client, API, and async job orchestration need to work together to create, track, route to, and display human approval requests, or similar webhook operations.

Workflow DevKit provides all of these capabilities out of the box. Your agent becomes a workflow, your tools become steps, and the framework handles interplay with your existing infrastructure.

Getting Started

To make an Agent durable, we first need an Agent. This guide assumes you have a basic chat application using the AI SDK. If you’re starting from scratch, check out the AI SDK documentation first.

Install Dependencies

Add the Workflow DevKit packages to your project:

npm i workflow @workflow/ai

Extend your Next.js config to transform workflow code:

next.config.ts

import { withWorkflow } from "workflow/next";
import type { NextConfig } from "next";

const nextConfig: NextConfig = {
  // ... rest of your Next.js config
};

export default withWorkflow(nextConfig);

Create a Workflow Function

Move your agent logic into a workflow function:

workflows/chat/workflow.ts

import { DurableAgent } from "@workflow/ai/agent";
import { getWritable } from "workflow";
import type { ModelMessage, UIMessageChunk } from "ai";
import { z } from "zod";

export async function chatWorkflow(messages: ModelMessage[]) {
  "use workflow";

  const writable = getWritable<UIMessageChunk>();

  const agent = new DurableAgent({
    model: "anthropic/claude-opus",
    system: "You are a helpful assistant.",
    tools: {
      getWeather: {
        description: "Get weather for a location",
        inputSchema: z.object({ location: z.string() }),
        execute: getWeatherStep,
      },
    },
  });

  await agent.stream({
    messages,
    writable,
  });
}

async function getWeatherStep({ location }: { location: string }) {
  "use step";
  
  // This step is automatically retried on failure
  const response = await fetch(`https://api.weather.com/${location}`);
  return response.json();
}

Key changes:

Add "use workflow" directive to mark the function as a workflow
Replace Agent with DurableAgent from @workflow/ai/agent
Use getWritable() to get a persistent stream for agent output
Mark tool implementations with "use step" for automatic retries

Update the API Route

Replace the agent call with start() to run the workflow:

app/api/chat/route.ts

import type { UIMessage } from "ai";
import { convertToModelMessages, createUIMessageStreamResponse } from "ai";
import { start } from "workflow/api";
import { chatWorkflow } from "@/workflows/chat/workflow";

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();
  const modelMessages = convertToModelMessages(messages);

  const run = await start(chatWorkflow, [modelMessages]);

  return createUIMessageStreamResponse({
    stream: run.readable,
  });
}

Key changes:

Call start() to run the workflow function
Return run.readable as the stream instead of creating a new one

Convert Tools to Steps

Mark all tool implementations with "use step" to make them durable:

workflows/chat/steps/tools.ts

export async function getWeather({ location }: { location: string }) {
  "use step";

  const response = await fetch(`https://api.weather.com/${location}`);
  return response.json();
}

export async function searchFlights({
  from,
  to,
  date,
}: {
  from: string;
  to: string;
  date: string;
}) {
  "use step";

  // This step is automatically retried on failure
  const response = await fetch(`https://api.flights.com/search`, {
    method: "POST",
    body: JSON.stringify({ from, to, date }),
  });

  return response.json();
}

With "use step":

The tool execution runs in a separate step with full Node.js access
Failed tool calls are automatically retried (up to 3 times by default)
Each tool execution appears as a discrete step in observability tools

That’s it! Your AI agent is now durable. If you run your development server and send a chat message, you should see your agent respond just as before, but now with added durability and observability.

Observability

In your app directory, open the observability dashboard to see your workflow in action:

npx workflow web

This opens a local dashboard showing all workflow runs and their status, as well as a trace viewer to inspect the workflow in detail, including retry attempts, and the data being passed between steps.

Next Steps

Now that you have a basic durable agent, explore these additional features:

Core Concepts

DurableAgent vs Agent

DurableAgent is a drop-in replacement for AI SDK’s Agent class that makes all LLM calls and tool executions durable:

Automatic persistence: All messages, tool calls, and results are automatically saved
Step-based execution: Each LLM call runs as a workflow step with built-in retries
Stream persistence: Output streams are durable and can be resumed after interruptions
Observability: Full tracing of all agent actions through the workflow dashboard

Workflows and Steps

Workflows ("use workflow") orchestrate the overall agent behavior and maintain state
Steps ("use step") execute individual operations like API calls with automatic retries

Learn more in Workflows and Steps.

Stream Management

getWritable() provides a persistent stream that:

Survives function timeouts and crashes
Can be read by multiple clients simultaneously
Supports resumption from any point
Automatically handles backpressure

Learn more in Streaming.

Defining Tools - Patterns for defining tools for your agent
DurableAgent API Reference - Full API documentation
Workflows and Steps - Core concepts
Streaming - In-depth streaming guide
Errors and Retries - Error handling patterns

Getting Started

Installation

Core Concepts

Building AI Agents

How It Works

Deployment

Observability

Testing

Error Reference

Why Durable Agents?

Getting Started

Install Dependencies

Create a Workflow Function

Update the API Route

Convert Tools to Steps

Observability

Next Steps

Core Concepts

DurableAgent vs Agent

Workflows and Steps

Stream Management

Build docs developers (and LLMs) love

Getting Started

Installation

Core Concepts

Building AI Agents

How It Works

Deployment

Observability

Testing

Error Reference

​Why Durable Agents?

​Getting Started

​Install Dependencies

​Create a Workflow Function

​Update the API Route

​Convert Tools to Steps

​Observability

​Next Steps

​Core Concepts

​DurableAgent vs Agent

​Workflows and Steps

​Stream Management

​Related Documentation

Build docs developers (and LLMs) love

Why Durable Agents?

Getting Started

Install Dependencies

Create a Workflow Function

Update the API Route

Convert Tools to Steps

Observability

Next Steps

Core Concepts

DurableAgent vs Agent

Workflows and Steps

Stream Management

Related Documentation