Cloudflare AI Workers — Deploy AI Apps at the Edge
Run AI models on Cloudflare's global edge network. Workers AI provides serverless inference for LLMs, embeddings, image generation, and speech-to-text at low latency.
What it is
Cloudflare AI Workers lets you run AI inference on Cloudflare's global edge network. Workers AI provides serverless access to models including Meta Llama, Stable Diffusion, Whisper, and embedding models. You write a few lines of TypeScript, deploy to Cloudflare, and your AI app runs close to users across 300+ data centers.
This workflow is for developers who want low-latency AI inference without provisioning GPUs or managing model serving infrastructure. It suits prototyping, production APIs, and edge-first applications where latency matters.
How it saves time or tokens
The workflow provides a ready-to-deploy project scaffold. Instead of configuring model endpoints, GPU instances, or container orchestration, you get a single npm create command that generates a working Cloudflare Worker with AI bindings. Deployment is one command away, and Cloudflare handles scaling, routing, and model hosting.
How to use
- Scaffold a new Cloudflare AI project:
npm create cloudflare@latest my-ai-app
cd my-ai-app
- Write your Worker with AI bindings:
export default {
async fetch(request, env) {
const response = await env.AI.run(
'@cf/meta/llama-3.1-8b-instruct',
{
messages: [
{ role: 'user', content: 'Explain edge computing in one paragraph' }
]
}
);
return new Response(JSON.stringify(response));
}
};
- Deploy to Cloudflare's edge:
npx wrangler deploy
Example
// Text embeddings for semantic search
export default {
async fetch(request, env) {
const text = 'How do I deploy an AI model?';
const embeddings = await env.AI.run(
'@cf/baai/bge-base-en-v1.5',
{ text: [text] }
);
return Response.json({ embeddings: embeddings.data });
}
};
Related on TokRepo
- AI gateway providers -- Cloudflare AI Gateway for routing and caching AI requests
- DevOps tools -- Infrastructure and deployment automation tools
Common pitfalls
- Workers AI model availability varies by plan. Free tier has request limits and not all models are available. Check the Cloudflare dashboard for your plan's model catalog.
- Response streaming requires specific Worker syntax with TransformStream. The basic fetch pattern returns the full response at once.
- Cold starts on less-popular models can add latency on the first request. Frequently-used models stay warm across Cloudflare's network.
Frequently Asked Questions
Workers AI supports text generation models (Meta Llama family), embedding models (BGE, sentence-transformers), image generation (Stable Diffusion), speech-to-text (Whisper), and more. The model catalog is updated regularly on the Cloudflare developer docs.
Workers AI includes a free tier with a limited number of neurons (Cloudflare's billing unit for AI inference). Paid plans charge per neuron consumed. Exact pricing varies by model size and is detailed on the Cloudflare pricing page.
Workers AI currently supports Cloudflare's curated model catalog. You cannot upload custom weights. If you need custom models, consider using Workers as a proxy to your own inference endpoint.
Yes. You can stream text generation responses token by token using Server-Sent Events. This requires using the stream option in the AI.run call and returning a readable stream from your Worker.
Workers AI runs on Cloudflare's global network of 300+ data centers. Requests are routed to the nearest data center that has the requested model available. GPU-accelerated inference is available in select locations.
Citations (3)
- Cloudflare Workers AI Docs— Workers AI provides serverless inference on Cloudflare's edge network
- Workers AI Models— Supports Meta Llama, Stable Diffusion, Whisper and embedding models
- Cloudflare Network— Cloudflare operates 300+ data centers globally
Related on TokRepo
Source & Thanks
Created by Cloudflare.
Documentation: developers.cloudflare.com/workers-ai
Discussion
Related Assets
Conda — Cross-Platform Package and Environment Manager
Install, update, and manage packages and isolated environments for Python, R, C/C++, and hundreds of other languages from a single tool.
Sphinx — Python Documentation Generator
Generate professional documentation from reStructuredText and Markdown with cross-references, API autodoc, and multiple output formats.
Neutralinojs — Lightweight Cross-Platform Desktop Apps
Build desktop applications with HTML, CSS, and JavaScript using a tiny native runtime instead of bundling Chromium.