Build durable workflows with Cloudflare Workflows (GA April 2025). Features step.do, step.sleep, waitForEvent, Vitest testing, automatic retries, and state persistence for long-running tasks. Prevents 12 documented errors. Use when: creating workflows, implementing retries, or troubleshooting NonRetryableError, I/O context, serialization errors, waitForEvent timeouts, getPlatformProxy failures.
87
88%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Status: Production Ready ✅ (GA since April 2025) Last Updated: 2026-01-09 Dependencies: cloudflare-worker-base (for Worker setup) Latest Versions: wrangler@4.58.0, @cloudflare/workers-types@4.20260109.0
Recent Updates (2025):
# 1. Scaffold project
npm create cloudflare@latest my-workflow -- --template cloudflare/workflows-starter --git --deploy false
cd my-workflow
# 2. Configure wrangler.jsonc
{
"name": "my-workflow",
"main": "src/index.ts",
"compatibility_date": "2025-11-25",
"workflows": [{
"name": "my-workflow",
"binding": "MY_WORKFLOW",
"class_name": "MyWorkflow"
}]
}
# 3. Create workflow (src/index.ts)
import { WorkflowEntrypoint, WorkflowStep, WorkflowEvent } from 'cloudflare:workers';
export class MyWorkflow extends WorkflowEntrypoint<Env, Params> {
async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
const result = await step.do('process', async () => { /* work */ });
await step.sleep('wait', '1 hour');
await step.do('continue', async () => { /* more work */ });
}
}
# 4. Deploy and test
npm run deploy
npx wrangler workflows instances list my-workflowCRITICAL: Extends WorkflowEntrypoint, implements run() with step methods, bindings in wrangler.jsonc
This skill prevents 12 documented errors with Cloudflare Workflows.
Error: Events sent after a waitForEvent() timeout are ignored in subsequent waitForEvent() calls
Environment: Local development (wrangler dev) only - works correctly in production
Source: GitHub Issue #11740
Why It Happens: Bug in miniflare that was fixed in production (May 2025) but not ported to local emulator. After a timeout, the event queue becomes corrupted for that instance.
Prevention:
waitForEvent() calls where timeouts are expectedExample of Bug:
export class MyWorkflow extends WorkflowEntrypoint<Env, Params> {
async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
for (let i = 0; i < 3; i++) {
try {
const evt = await step.waitForEvent(`wait-${i}`, {
type: 'user-action',
timeout: '5 seconds'
});
console.log(`Iteration ${i}: Received event`);
} catch {
console.log(`Iteration ${i}: Timeout`);
}
}
}
}
// In wrangler dev:
// - Iteration 1: ✅ receives event
// - Iteration 2: ⏱️ times out (expected)
// - Iteration 3: ❌ does not receive event (BUG - event is sent but ignored)Status: Known bug, fix pending for miniflare.
Error: MiniflareCoreError [ERR_RUNTIME_FAILURE]: The Workers runtime failed to start
Message: Worker's binding refers to service with named entrypoint, but service has no such entrypoint
Source: GitHub Issue #9402
Why It Happens: getPlatformProxy() from wrangler package doesn't support Workflow bindings (similar to how it handles Durable Objects). This blocks Next.js integration and local CLI scripts.
Prevention:
getPlatformProxy()wrangler.cli.jsonc without workflows for CLI scripts// Workaround: Separate config for CLI scripts
// wrangler.cli.jsonc (no workflows)
{
"name": "my-worker",
"main": "src/index.ts",
"compatibility_date": "2025-01-20"
// workflows commented out
}
// Use in script:
import { getPlatformProxy } from 'wrangler';
const { env } = await getPlatformProxy({ configPath: './wrangler.cli.jsonc' });Status: Known limitation, fix planned (filter workflows similar to DOs).
Error: Instance ID returned but instance.not_found when queried
Environment: Local development (wrangler dev) only - works correctly in production
Source: GitHub Issue #10806
Why It Happens: Returning a redirect immediately after workflow.create() causes request to "soft abort" before workflow initialization completes (single-threaded execution in dev).
Prevention: Use ctx.waitUntil() to ensure workflow initialization completes before redirect:
export default {
async fetch(req: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
const workflow = await env.MY_WORKFLOW.create({ params: { userId: '123' } });
// ✅ Ensure workflow initialization completes
ctx.waitUntil(workflow.status());
return Response.redirect('/dashboard', 302);
}
};Status: Fixed in recent wrangler versions (post-Sept 2025), but workaround still recommended for compatibility.
Error: [vitest-worker]: Timeout calling "resolveId"
Environment: CI/CD pipelines (GitLab, GitHub Actions) - works locally
Source: GitHub Issue #10600
Why It Happens: @cloudflare/vitest-pool-workers has resource constraint issues in CI containers, affecting workflow tests more than other worker types.
Prevention:
testTimeout in vitest config:
export default defineWorkersConfig({
test: {
testTimeout: 60_000 // Default: 5000ms
}
});isolatedStorage: false if not testing storage isolationStatus: Known issue, investigating (Internal: WOR-945).
Error: Error: Not implemented yet when calling instance.restart() or instance.terminate()
Environment: Local development (wrangler dev) only - works in production
Source: GitHub Issue #11312
Why It Happens: Instance management APIs not yet implemented in miniflare. Additionally, instance status shows running even when workflow is sleeping.
Prevention: Test instance lifecycle management (pause/resume/terminate) in production or staging environment until local dev support is added.
const instance = await env.MY_WORKFLOW.get(instanceId);
// ❌ Fails in wrangler dev
await instance.restart(); // Error: Not implemented yet
await instance.terminate(); // Error: Not implemented yet
// ✅ Works in productionStatus: Known limitation, no timeline for local dev support.
Error: "Cannot perform I/O on behalf of a different request"
Source: Cloudflare runtime behavior
Why It Happens: Trying to use I/O objects created in one request context from another request handler.
Prevention: Always perform I/O within step.do() callbacks:
// ❌ Bad - I/O outside step
const response = await fetch('https://api.example.com/data');
const data = await response.json();
await step.do('use data', async () => {
return data; // This will fail!
});
// ✅ Good - I/O inside step
const data = await step.do('fetch data', async () => {
const response = await fetch('https://api.example.com/data');
return await response.json();
});Error: NonRetryableError with empty message causes retries in dev mode but works correctly in production Environment: Development-specific bug Source: GitHub Issue #10113
Why It Happens: Empty error messages are handled differently between miniflare and production runtime.
Prevention: Always provide a message to NonRetryableError:
// ❌ Retries in dev, exits in prod
throw new NonRetryableError('');
// ✅ Exits in both environments
throw new NonRetryableError('Validation failed');Status: Known issue, workaround documented.
Error: Variables declared outside step.do() reset to initial values after sleep/hibernation
Source: Cloudflare Workflows Rules
Why It Happens: Workflows hibernate when the engine detects no pending work. All in-memory state is lost during hibernation.
Prevention: Only use state returned from step.do() - everything else is ephemeral:
// ❌ BAD - In-memory variable lost on hibernation
let counter = 0;
export class MyWorkflow extends WorkflowEntrypoint<Env, Params> {
async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
counter = await step.do('increment', async () => counter + 1);
await step.sleep('wait', '1 hour'); // ← Hibernates here, in-memory state lost
console.log(counter); // ❌ Will be 0, not 1!
}
}
// ✅ GOOD - State from step.do() return values persists
export class MyWorkflow extends WorkflowEntrypoint<Env, Params> {
async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
const counter = await step.do('increment', async () => 1);
await step.sleep('wait', '1 hour');
console.log(counter); // ✅ Still 1
}
}Error: Steps re-run unnecessarily, performance degradation Source: Cloudflare Workflows Rules
Why It Happens: Step names act as cache keys. Using Date.now(), Math.random(), or other non-deterministic values causes new cache keys every run.
Prevention: Use static, deterministic step names:
// ❌ BAD - Non-deterministic step name
await step.do(`fetch-data-${Date.now()}`, async () => {
return await fetchExpensiveData();
});
// Every execution creates new cache key → step always re-runs
// ✅ GOOD - Deterministic step name
await step.do('fetch-data', async () => {
return await fetchExpensiveData();
});
// Same cache key → result reused on restart/retryError: Different promises resolve on restart, inconsistent behavior Source: Cloudflare Workflows Rules
Why It Happens: Non-deterministic operations outside steps run again on restart, potentially with different results.
Prevention: Keep all non-deterministic logic inside step.do():
// ❌ BAD - Race outside step
const fastest = await Promise.race([fetchA(), fetchB()]);
await step.do('use result', async () => fastest);
// On restart: race runs again, different promise might win
// ✅ GOOD - Race inside step
const fastest = await step.do('fetch fastest', async () => {
return await Promise.race([fetchA(), fetchB()]);
});
// On restart: cached result used, consistent behaviorError: Duplicate logs, metrics, or operations after workflow restart Source: Cloudflare Workflows Rules
Why It Happens: Code outside step.do() executes multiple times if the workflow restarts mid-execution.
Prevention: Put logging, metrics, and other side effects inside step.do():
// ❌ BAD - Side effect outside step
console.log('Workflow started'); // ← Logs multiple times on restart
await step.do('work', async () => { /* work */ });
// ✅ GOOD - Side effects inside step
await step.do('log start', async () => {
console.log('Workflow started'); // ← Logs once (cached)
});Error: Double charges, duplicate database writes after step timeout Source: Cloudflare Workflows Rules
Why It Happens: Steps retry individually. If an API call succeeds but the step times out before returning, the retry will call the API again.
Prevention: Guard non-idempotent operations with existence checks:
// ❌ BAD - Charge customer without check
await step.do('charge', async () => {
return await stripe.charges.create({ amount: 1000, customer: customerId });
});
// If step times out after charge succeeds, retry charges AGAIN!
// ✅ GOOD - Check for existing charge first
await step.do('charge', async () => {
const existing = await stripe.charges.list({ customer: customerId, limit: 1 });
if (existing.data.length > 0) return existing.data[0]; // Idempotent
return await stripe.charges.create({ amount: 1000, customer: customerId });
});step.do<T>(name: string, config?: WorkflowStepConfig, callback: () => Promise<T>): Promise<T>Parameters:
name - Step name (for observability)config (optional) - Retry configuration (retries, timeout, backoff)callback - Async function that does the workReturns: Value from callback (must be serializable)
Example:
const result = await step.do('call API', { retries: { limit: 10, delay: '10s', backoff: 'exponential' }, timeout: '5 min' }, async () => {
return await fetch('https://api.example.com/data').then(r => r.json());
});CRITICAL - Serialization:
step.sleep(name: string, duration: WorkflowDuration): Promise<void>Parameters:
name - Step nameduration - Number (ms) or string: "second", "minute", "hour", "day", "week", "month", "year" (plural forms accepted)Examples:
await step.sleep('wait 5 minutes', '5 minutes');
await step.sleep('wait 1 hour', '1 hour');
await step.sleep('wait 2 days', '2 days');
await step.sleep('wait 30 seconds', 30000); // millisecondsNote: Resuming workflows take priority over new instances. Sleeps don't count toward step limits.
step.sleepUntil(name: string, timestamp: Date | number): Promise<void>Parameters:
name - Step nametimestamp - Date object or UNIX timestamp (milliseconds)Examples:
await step.sleepUntil('wait for launch', new Date('2025-12-25T00:00:00Z'));
await step.sleepUntil('wait until time', Date.parse('24 Oct 2024 13:00:00 UTC'));step.waitForEvent<T>(name: string, options: { type: string; timeout?: string | number }): Promise<T>Parameters:
name - Step nameoptions.type - Event type to matchoptions.timeout (optional) - Max wait time (default: 24 hours, max: 30 days)Returns: Event payload sent via instance.sendEvent()
Example:
export class PaymentWorkflow extends WorkflowEntrypoint<Env, Params> {
async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
await step.do('create payment', async () => { /* Stripe API */ });
const webhookData = await step.waitForEvent<StripeWebhook>(
'wait for payment confirmation',
{ type: 'stripe-webhook', timeout: '1 hour' }
);
if (webhookData.status === 'succeeded') {
await step.do('fulfill order', async () => { /* fulfill */ });
}
}
}
// Worker sends event to workflow
export default {
async fetch(req: Request, env: Env): Promise<Response> {
if (req.url.includes('/webhook/stripe')) {
const instance = await env.PAYMENT_WORKFLOW.get(instanceId);
await instance.sendEvent({ type: 'stripe-webhook', payload: await req.json() });
return new Response('OK');
}
}
};Timeout handling:
try {
const event = await step.waitForEvent('wait for user', { type: 'user-submitted', timeout: '10 minutes' });
} catch (error) {
await step.do('send reminder', async () => { /* reminder */ });
}interface WorkflowStepConfig {
retries?: {
limit: number; // Max attempts (Infinity allowed)
delay: string | number; // Delay between retries
backoff?: 'constant' | 'linear' | 'exponential';
};
timeout?: string | number; // Max time per attempt
}Default: { retries: { limit: 5, delay: 10000, backoff: 'exponential' }, timeout: '10 minutes' }
Backoff Examples:
// Constant: 30s, 30s, 30s
{ retries: { limit: 3, delay: '30 seconds', backoff: 'constant' } }
// Linear: 1m, 2m, 3m, 4m, 5m
{ retries: { limit: 5, delay: '1 minute', backoff: 'linear' } }
// Exponential (recommended): 10s, 20s, 40s, 80s, 160s
{ retries: { limit: 10, delay: '10 seconds', backoff: 'exponential' }, timeout: '5 minutes' }
// Unlimited retries
{ retries: { limit: Infinity, delay: '1 minute', backoff: 'exponential' } }
// No retries
{ retries: { limit: 0 } }Force workflow to fail immediately without retrying:
import { WorkflowEntrypoint, WorkflowStep, WorkflowEvent } from 'cloudflare:workers';
import { NonRetryableError } from 'cloudflare:workflows';
export class MyWorkflow extends WorkflowEntrypoint<Env, Params> {
async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
await step.do('validate input', async () => {
if (!event.payload.userId) {
throw new NonRetryableError('userId is required');
}
// Validate user exists
const user = await this.env.DB.prepare(
'SELECT * FROM users WHERE id = ?'
).bind(event.payload.userId).first();
if (!user) {
// Terminal error - retrying won't help
throw new NonRetryableError('User not found');
}
return user;
});
}
}When to use NonRetryableError:
Prevent workflow failure by catching optional step errors:
export class MyWorkflow extends WorkflowEntrypoint<Env, Params> {
async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
await step.do('process payment', async () => { /* critical */ });
try {
await step.do('send email', async () => { /* optional */ });
} catch (error) {
await step.do('log failure', async () => {
await this.env.DB.prepare('INSERT INTO failed_emails VALUES (?, ?)').bind(event.payload.userId, error.message).run();
});
}
await step.do('update status', async () => { /* continues */ });
}
}Graceful Degradation:
let result;
try {
result = await step.do('call primary API', async () => await callPrimaryAPI());
} catch {
result = await step.do('call backup API', async () => await callBackupAPI());
}Configure binding (wrangler.jsonc):
{
"workflows": [{
"name": "my-workflow",
"binding": "MY_WORKFLOW",
"class_name": "MyWorkflow",
"script_name": "workflow-worker" // If workflow in different Worker
}]
}Trigger from Worker:
const instance = await env.MY_WORKFLOW.create({ params: { userId: '123' } });
return Response.json({ id: instance.id, status: await instance.status() });Instance Management:
const instance = await env.MY_WORKFLOW.get(instanceId);
const status = await instance.status(); // { status: 'running'|'complete'|'errored'|'queued', error, output }
await instance.sendEvent({ type: 'user-action', payload: { action: 'approved' } });
await instance.pause();
await instance.resume();
await instance.terminate();Workflows automatically persist state returned from step.do():
✅ Serializable:
string, number, boolean, null❌ Non-Serializable:
Example:
// ✅ Good
const result = await step.do('fetch data', async () => ({
users: [{ id: 1, name: 'Alice' }],
timestamp: Date.now(),
metadata: null
}));
// ❌ Bad - function not serializable
const bad = await step.do('bad', async () => ({ data: [1, 2, 3], transform: (x) => x * 2 })); // Throws error!Access State Across Steps:
const userData = await step.do('fetch user', async () => ({ id: 123, email: 'user@example.com' }));
const orderData = await step.do('create order', async () => ({ userId: userData.id, orderId: 'ORD-456' }));
await step.do('send email', async () => sendEmail({ to: userData.email, subject: `Order ${orderData.orderId}` }));Workflows automatically track:
Access via Cloudflare dashboard:
Metrics include:
const instance = await env.MY_WORKFLOW.get(instanceId);
const status = await instance.status();
console.log(status);
// {
// status: 'complete' | 'running' | 'errored' | 'queued' | 'waiting' | 'unknown',
// error: string | null,
// output: { userId: '123', status: 'processed' }
// }CPU Time Configuration (2025):
// wrangler.jsonc
{ "limits": { "cpu_ms": 300000 } } // 5 minutes max (default: 30 seconds)| Feature | Workers Free | Workers Paid |
|---|---|---|
| Max steps per workflow | 1,024 | 1,024 |
| Max state per step | 1 MiB | 1 MiB |
| Max state per instance | 100 MB | 1 GB |
| Max event payload size | 1 MiB | 1 MiB |
| Max sleep/sleepUntil duration | 365 days | 365 days |
| Max waitForEvent timeout | 365 days | 365 days |
| CPU time per step | 10 ms | 30 sec (default), 5 min (max) |
| Duration (wall clock) per step | Unlimited | Unlimited |
| Max workflow executions | 100,000/day | Unlimited |
| Concurrent instances | 25 | 10,000 (Oct 2025, up from 4,500) |
| Instance creation rate | 100/second | 100/second (Oct 2025, 10x faster) |
| Max queued instances | 100,000 | 1,000,000 |
| Max subrequests per instance | 50/request | 1,000/request |
| Retention (completed state) | 3 days | 30 days |
| Max Workflow name length | 64 chars | 64 chars |
| Max instance ID length | 100 chars | 100 chars |
CRITICAL Notes:
step.sleep() and step.sleepUntil() do NOT count toward 1,024 step limitwrangler.jsonc: { "limits": { "cpu_ms": 300000 } } (5 min max)Requires Workers Paid plan ($5/month)
Workflow Executions:
What counts as a step execution:
step.do() callstep.sleep(), step.sleepUntil(), step.waitForEvent() do NOT countCost examples:
Workflows support full testing integration via cloudflare:test module.
npm install -D vitest@latest @cloudflare/vitest-pool-workers@latestvitest.config.ts:
import { defineWorkersConfig } from '@cloudflare/vitest-pool-workers/config';
export default defineWorkersConfig({ test: { poolOptions: { workers: { miniflare: { bindings: { MY_WORKFLOW: { scriptName: 'workflow' } } } } } } });import { env, introspectWorkflowInstance } from 'cloudflare:test';
it('should complete workflow', async () => {
const instance = await introspectWorkflowInstance(env.MY_WORKFLOW, 'test-123');
try {
await instance.modify(async (m) => {
await m.disableSleeps(); // Skip all sleeps
await m.mockStepResult({ name: 'fetch data' }, { users: [{ id: 1 }] }); // Mock step result
await m.mockEvent({ type: 'approval', payload: { approved: true } }); // Send mock event
await m.mockStepError({ name: 'call API' }, new Error('Network timeout'), 1); // Force error once
});
await env.MY_WORKFLOW.create({ id: 'test-123' });
await expect(instance.waitForStatus('complete')).resolves.not.toThrow();
} finally {
await instance.dispose(); // Cleanup
}
});disableSleeps(steps?) - Skip sleeps instantlymockStepResult(step, result) - Mock step.do() resultmockStepError(step, error, times?) - Force step.do() to throwmockEvent(event) - Send mock event to step.waitForEvent()forceStepTimeout(step, times?) - Force step.do() timeoutforceEventTimeout(step) - Force step.waitForEvent() timeoutOfficial Docs: https://developers.cloudflare.com/workers/testing/vitest-integration/
mcp__cloudflare-docs__search_cloudflare_documentation for latest docsLast Updated: 2026-01-21 Version: 2.0.0 Changes: Added 12 documented Known Issues (TIER 1-2 research findings): waitForEvent timeout bug, getPlatformProxy failure, redirect instance loss, Vitest CI issues, local dev limitations, state persistence rules, caching gotchas, and idempotency patterns Maintainer: Jeremy Dawes | jeremy@jezweb.net
fa91c34
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.