# Cloudflare AI Workers — Deploy AI Apps at the Edge

> Run AI models on Cloudflare's global edge network. Workers AI provides serverless inference for LLMs, embeddings, image generation, and speech-to-text at low latency.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

```bash
npm create cloudflare@latest my-ai-app
cd my-ai-app
```

```typescript
export default {
  async fetch(request, env) {
    const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
      messages: [{ role: "user", content: "What is Cloudflare?" }],
    });
    return Response.json(response);
  },
};
```

```bash
npx wrangler deploy
```

## What is Cloudflare Workers AI?

Workers AI lets you run AI models on Cloudflare's global edge network — 300+ cities worldwide. It provides serverless inference for LLMs, text embeddings, image generation, speech-to-text, and more with no GPU management, automatic scaling, and pay-per-request pricing.

**Answer-Ready**: Cloudflare Workers AI provides serverless AI inference on a global edge network (300+ cities). Run Llama, Mistral, Stable Diffusion, and Whisper models with no GPU management, auto-scaling, and pay-per-request pricing.

**Best for**: Developers building AI features who want low-latency, serverless deployment. **Works with**: Llama 3, Mistral, Stable Diffusion, Whisper, BAAI embeddings. **Setup time**: Under 5 minutes.

## Core Features

### 1. Pre-Built Model Catalog

| Category | Models |
|----------|--------|
| Text Generation | Llama 3.1 (8B/70B), Mistral 7B, Gemma |
| Embeddings | BAAI bge-base, bge-large |
| Image Generation | Stable Diffusion XL, FLUX.1 |
| Speech-to-Text | Whisper |
| Translation | Meta M2M-100 |
| Classification | BERT, DistilBERT |

### 2. Vectorize (Built-In Vector DB)

```typescript
// Create index
const index = env.VECTORIZE_INDEX;

// Insert embeddings
const embedding = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
  text: ["document text here"],
});
await index.upsert([{ id: "doc1", values: embedding.data[0], metadata: { title: "..." } }]);

// Query
const results = await index.query(queryVector, { topK: 5 });
```

### 3. AI Gateway
Route, cache, and monitor AI API calls:

```typescript
const response = await fetch("https://gateway.ai.cloudflare.com/v1/{account}/my-gateway/openai/chat/completions", {
  method: "POST",
  headers: { "Authorization": "Bearer sk-...", "Content-Type": "application/json" },
  body: JSON.stringify({ model: "gpt-4o", messages: [...] }),
});
```

Features: caching, rate limiting, fallbacks, analytics, logging.

### 4. Edge Deployment
Models run on Cloudflare's GPU fleet across 300+ cities:
- P50 latency: < 50ms for embeddings
- Auto-scaling: 0 to millions of requests
- No cold starts for popular models

### 5. Pay-Per-Request Pricing

| Resource | Free Tier | Paid |
|----------|-----------|------|
| Neurons (compute) | 10,000/day | $0.011 per 1,000 |
| Vectorize queries | 30M/mo | $0.01 per 1M |
| Storage | 5M vectors | $0.05 per 1M |

## FAQ

**Q: Can I use my own fine-tuned models?**
A: Yes, via LoRA adapters on supported base models.

**Q: How does it compare to AWS Bedrock?**
A: Workers AI is edge-native (lower latency globally), simpler to use, and cheaper for small-to-medium workloads. Bedrock offers more enterprise models.

**Q: Is there a free tier?**
A: Yes, 10,000 neurons/day free — enough for ~100-200 LLM requests.

## Source & Thanks

> Created by [Cloudflare](https://developers.cloudflare.com/workers-ai/).
>
> Documentation: [developers.cloudflare.com/workers-ai](https://developers.cloudflare.com/workers-ai/)

<!-- ZH -->


## Quick Start

```bash
npm create cloudflare@latest my-ai-app
```

Deploy an AI app to 300+ edge cities worldwide in 5 minutes.

## What is Workers AI?

Cloudflare Workers AI runs AI model inference on the global edge network — no GPU management, auto-scaling, pay-per-request.

**In one sentence**: Cloudflare Workers AI provides serverless AI inference in 300+ cities worldwide, supporting Llama, Stable Diffusion, Whisper, and more.

**For**: Developers needing low-latency serverless AI deployment.

## Core Features

### 1. Preloaded Model Catalog
Text generation, embeddings, image generation, speech-to-text, and more.

### 2. Built-In Vector Database
Vectorize provides embedding storage and query.

### 3. AI Gateway
Routing, caching, and monitoring for AI API calls.

### 4. Edge Deployment
GPU clusters in 300+ cities worldwide with p50 latency < 50ms.

## FAQ

**Q: Is there a free tier?**
A: Yes — 10,000 neurons per day are free.

**Q: How does it compare to AWS Bedrock?**
A: Workers AI is edge-native with lower latency, simpler, and cheaper for small-to-medium workloads.

## Source & Thanks

> [Cloudflare Workers AI](https://developers.cloudflare.com/workers-ai/)

---
Source: https://tokrepo.com/en/workflows/cloudflare-ai-workers-deploy-ai-apps-edge-bd8d0961
Author: Cloudflare