Cloudflare AI Workers — Deploy AI Apps at the Edge
Run AI models on Cloudflare's global edge network. Workers AI provides serverless inference for LLMs, embeddings, image generation, and speech-to-text at low latency.
Review-first install path
This asset needs a review step. The copied prompt tells the agent to dry-run, show the writes, then proceed only after confirmation.
npx -y tokrepo@latest install bd8d0961-db4e-4890-828e-095163614679 --target codexDry-run first, confirm the writes, then run this command.
What it is
Cloudflare AI Workers lets you run AI inference on Cloudflare's global edge network. Workers AI provides serverless access to models including Meta Llama, Stable Diffusion, Whisper, and embedding models. You write a few lines of TypeScript, deploy to Cloudflare, and your AI app runs close to users across 300+ data centers.
This workflow is for developers who want low-latency AI inference without provisioning GPUs or managing model serving infrastructure. It suits prototyping, production APIs, and edge-first applications where latency matters.
How it saves time or tokens
The workflow provides a ready-to-deploy project scaffold. Instead of configuring model endpoints, GPU instances, or container orchestration, you get a single npm create command that generates a working Cloudflare Worker with AI bindings. Deployment is one command away, and Cloudflare handles scaling, routing, and model hosting.
How to use
- Scaffold a new Cloudflare AI project:
npm create cloudflare@latest my-ai-app
cd my-ai-app
- Write your Worker with AI bindings:
export default {
async fetch(request, env) {
const response = await env.AI.run(
'@cf/meta/llama-3.1-8b-instruct',
{
messages: [
{ role: 'user', content: 'Explain edge computing in one paragraph' }
]
}
);
return new Response(JSON.stringify(response));
}
};
- Deploy to Cloudflare's edge:
npx wrangler deploy
Example
// Text embeddings for semantic search
export default {
async fetch(request, env) {
const text = 'How do I deploy an AI model?';
const embeddings = await env.AI.run(
'@cf/baai/bge-base-en-v1.5',
{ text: [text] }
);
return Response.json({ embeddings: embeddings.data });
}
};
Related on TokRepo
- AI gateway providers -- Cloudflare AI Gateway for routing and caching AI requests
- DevOps tools -- Infrastructure and deployment automation tools
Common pitfalls
- Workers AI model availability varies by plan. Free tier has request limits and not all models are available. Check the Cloudflare dashboard for your plan's model catalog.
- Response streaming requires specific Worker syntax with TransformStream. The basic fetch pattern returns the full response at once.
- Cold starts on less-popular models can add latency on the first request. Frequently-used models stay warm across Cloudflare's network.
Frequently Asked Questions
Workers AI supports text generation models (Meta Llama family), embedding models (BGE, sentence-transformers), image generation (Stable Diffusion), speech-to-text (Whisper), and more. The model catalog is updated regularly on the Cloudflare developer docs.
Workers AI includes a free tier with a limited number of neurons (Cloudflare's billing unit for AI inference). Paid plans charge per neuron consumed. Exact pricing varies by model size and is detailed on the Cloudflare pricing page.
Workers AI currently supports Cloudflare's curated model catalog. You cannot upload custom weights. If you need custom models, consider using Workers as a proxy to your own inference endpoint.
Yes. You can stream text generation responses token by token using Server-Sent Events. This requires using the stream option in the AI.run call and returning a readable stream from your Worker.
Workers AI runs on Cloudflare's global network of 300+ data centers. Requests are routed to the nearest data center that has the requested model available. GPU-accelerated inference is available in select locations.
Citations (3)
- Cloudflare Workers AI Docs— Workers AI provides serverless inference on Cloudflare's edge network
- Workers AI Models— Supports Meta Llama, Stable Diffusion, Whisper and embedding models
- Cloudflare Network— Cloudflare operates 300+ data centers globally
Related on TokRepo
Source & Thanks
Created by Cloudflare.
Documentation: developers.cloudflare.com/workers-ai
Discussion
Related Assets
Cloudflare Workers AI — Serverless AI Inference
Run AI models at the edge with Cloudflare Workers. Text generation, image generation, speech-to-text, translation, embeddings — all serverless with global distribution.
workerd — Cloudflare Workers JavaScript Runtime
workerd is the open-source JavaScript and WebAssembly runtime that powers Cloudflare Workers. It can be self-hosted to run Workers-compatible code locally or on your own infrastructure with the same V8 isolate-based execution model.
Cloudflare Agents — Stateful Agents on Durable Objects
Cloudflare Agents provides stateful execution environments for agent workloads on Durable Objects, with scheduling, realtime, MCP, and Workers deployment.
Serverless Framework — Build and Deploy Serverless Apps to Any Cloud
The most widely adopted toolkit for building serverless applications on AWS Lambda, Azure Functions, Google Cloud Functions, and more. Define infrastructure and functions in a single YAML file and deploy with one command.