What is Cloudflare AI Workers — Deploy AI Apps at the Edge?

Run AI models on Cloudflare's global edge network. Workers AI provides serverless inference for LLMs, embeddings, image generation, and speech-to-text at low latency.

Is Cloudflare AI Workers — Deploy AI Apps at the Edge free to use?

Yes. Cloudflare AI Workers — Deploy AI Apps at the Edge is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Cloudflare AI Workers — Deploy AI Apps at the Edge?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Cloudflare AI Workers — Deploy AI Apps at the Edge

What is Cloudflare Workers AI?

Workers AI lets you run AI models on Cloudflare's global edge network — 300+ cities worldwide. It provides serverless inference for LLMs, text embeddings, image generation, speech-to-text, and more with no GPU management, automatic scaling, and pay-per-request pricing.

Answer-Ready: Cloudflare Workers AI provides serverless AI inference on a global edge network (300+ cities). Run Llama, Mistral, Stable Diffusion, and Whisper models with no GPU management, auto-scaling, and pay-per-request pricing.

Best for: Developers building AI features who want low-latency, serverless deployment. Works with: Llama 3, Mistral, Stable Diffusion, Whisper, BAAI embeddings. Setup time: Under 5 minutes.

Core Features

1. Pre-Built Model Catalog

Category	Models
Text Generation	Llama 3.1 (8B/70B), Mistral 7B, Gemma
Embeddings	BAAI bge-base, bge-large
Image Generation	Stable Diffusion XL, FLUX.1
Speech-to-Text	Whisper
Translation	Meta M2M-100
Classification	BERT, DistilBERT

2. Vectorize (Built-In Vector DB)

// Create index
const index = env.VECTORIZE_INDEX;

// Insert embeddings
const embedding = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
  text: ["document text here"],
});
await index.upsert([{ id: "doc1", values: embedding.data[0], metadata: { title: "..." } }]);

// Query
const results = await index.query(queryVector, { topK: 5 });

3. AI Gateway

Route, cache, and monitor AI API calls:

const response = await fetch("https://gateway.ai.cloudflare.com/v1/{account}/my-gateway/openai/chat/completions", {
  method: "POST",
  headers: { "Authorization": "Bearer sk-...", "Content-Type": "application/json" },
  body: JSON.stringify({ model: "gpt-4o", messages: [...] }),
});

Features: caching, rate limiting, fallbacks, analytics, logging.

4. Edge Deployment

Models run on Cloudflare's GPU fleet across 300+ cities:

P50 latency: < 50ms for embeddings
Auto-scaling: 0 to millions of requests
No cold starts for popular models

5. Pay-Per-Request Pricing

Resource	Free Tier	Paid
Neurons (compute)	10,000/day	$0.011 per 1,000
Vectorize queries	30M/mo	$0.01 per 1M
Storage	5M vectors	$0.05 per 1M

FAQ

Q: Can I use my own fine-tuned models? A: Yes, via LoRA adapters on supported base models.

Q: How does it compare to AWS Bedrock? A: Workers AI is edge-native (lower latency globally), simpler to use, and cheaper for small-to-medium workloads. Bedrock offers more enterprise models.

Q: Is there a free tier? A: Yes, 10,000 neurons/day free — enough for ~100-200 LLM requests.