ConfigsApr 7, 2026·2 min read

Cloudflare AI Workers — Deploy AI Apps at the Edge

Run AI models on Cloudflare's global edge network. Workers AI provides serverless inference for LLMs, embeddings, image generation, and speech-to-text at low latency.

TL;DR
Workers AI runs LLMs, embeddings, and image models on Cloudflare's edge network with zero infrastructure management.
§01

What it is

Cloudflare AI Workers lets you run AI inference on Cloudflare's global edge network. Workers AI provides serverless access to models including Meta Llama, Stable Diffusion, Whisper, and embedding models. You write a few lines of TypeScript, deploy to Cloudflare, and your AI app runs close to users across 300+ data centers.

This workflow is for developers who want low-latency AI inference without provisioning GPUs or managing model serving infrastructure. It suits prototyping, production APIs, and edge-first applications where latency matters.

§02

How it saves time or tokens

The workflow provides a ready-to-deploy project scaffold. Instead of configuring model endpoints, GPU instances, or container orchestration, you get a single npm create command that generates a working Cloudflare Worker with AI bindings. Deployment is one command away, and Cloudflare handles scaling, routing, and model hosting.

§03

How to use

  1. Scaffold a new Cloudflare AI project:
npm create cloudflare@latest my-ai-app
cd my-ai-app
  1. Write your Worker with AI bindings:
export default {
  async fetch(request, env) {
    const response = await env.AI.run(
      '@cf/meta/llama-3.1-8b-instruct',
      {
        messages: [
          { role: 'user', content: 'Explain edge computing in one paragraph' }
        ]
      }
    );
    return new Response(JSON.stringify(response));
  }
};
  1. Deploy to Cloudflare's edge:
npx wrangler deploy
§04

Example

// Text embeddings for semantic search
export default {
  async fetch(request, env) {
    const text = 'How do I deploy an AI model?';
    const embeddings = await env.AI.run(
      '@cf/baai/bge-base-en-v1.5',
      { text: [text] }
    );
    return Response.json({ embeddings: embeddings.data });
  }
};
§05

Related on TokRepo

§06

Common pitfalls

  • Workers AI model availability varies by plan. Free tier has request limits and not all models are available. Check the Cloudflare dashboard for your plan's model catalog.
  • Response streaming requires specific Worker syntax with TransformStream. The basic fetch pattern returns the full response at once.
  • Cold starts on less-popular models can add latency on the first request. Frequently-used models stay warm across Cloudflare's network.

Frequently Asked Questions

What AI models are available on Workers AI?+

Workers AI supports text generation models (Meta Llama family), embedding models (BGE, sentence-transformers), image generation (Stable Diffusion), speech-to-text (Whisper), and more. The model catalog is updated regularly on the Cloudflare developer docs.

How much does Workers AI cost?+

Workers AI includes a free tier with a limited number of neurons (Cloudflare's billing unit for AI inference). Paid plans charge per neuron consumed. Exact pricing varies by model size and is detailed on the Cloudflare pricing page.

Can I use Workers AI with my own fine-tuned models?+

Workers AI currently supports Cloudflare's curated model catalog. You cannot upload custom weights. If you need custom models, consider using Workers as a proxy to your own inference endpoint.

Does Workers AI support streaming responses?+

Yes. You can stream text generation responses token by token using Server-Sent Events. This requires using the stream option in the AI.run call and returning a readable stream from your Worker.

What regions does Workers AI run in?+

Workers AI runs on Cloudflare's global network of 300+ data centers. Requests are routed to the nearest data center that has the requested model available. GPU-accelerated inference is available in select locations.

Citations (3)
🙏

Source & Thanks

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets