Is Cloudflare AI Workers — Deploy AI Apps at the Edge free to use?

Yes. Cloudflare AI Workers — Deploy AI Apps at the Edge is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Cloudflare AI Workers — Deploy AI Apps at the Edge?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ConfigsApr 7, 2026·2 min read

Cloudflare AI Workers — Deploy AI Apps at the Edge

Run AI models on Cloudflare's global edge network. Workers AI provides serverless inference for LLMs, embeddings, image generation, and speech-to-text at low latency.

AI Open Source · Community

TL;DR

Workers AI runs LLMs, embeddings, and image models on Cloudflare's edge network with zero infrastructure management.

§01

What it is

Cloudflare AI Workers lets you run AI inference on Cloudflare's global edge network. Workers AI provides serverless access to models including Meta Llama, Stable Diffusion, Whisper, and embedding models. You write a few lines of TypeScript, deploy to Cloudflare, and your AI app runs close to users across 300+ data centers.

This workflow is for developers who want low-latency AI inference without provisioning GPUs or managing model serving infrastructure. It suits prototyping, production APIs, and edge-first applications where latency matters.

§02

How it saves time or tokens

The workflow provides a ready-to-deploy project scaffold. Instead of configuring model endpoints, GPU instances, or container orchestration, you get a single npm create command that generates a working Cloudflare Worker with AI bindings. Deployment is one command away, and Cloudflare handles scaling, routing, and model hosting.

§03

How to use

Scaffold a new Cloudflare AI project:

npm create cloudflare@latest my-ai-app
cd my-ai-app

Write your Worker with AI bindings:

export default {
  async fetch(request, env) {
    const response = await env.AI.run(
      '@cf/meta/llama-3.1-8b-instruct',
      {
        messages: [
          { role: 'user', content: 'Explain edge computing in one paragraph' }
        ]
      }
    );
    return new Response(JSON.stringify(response));
  }
};

Deploy to Cloudflare's edge:

npx wrangler deploy

§04

Example

// Text embeddings for semantic search
export default {
  async fetch(request, env) {
    const text = 'How do I deploy an AI model?';
    const embeddings = await env.AI.run(
      '@cf/baai/bge-base-en-v1.5',
      { text: [text] }
    );
    return Response.json({ embeddings: embeddings.data });
  }
};

§05

Related on TokRepo

AI gateway providers -- Cloudflare AI Gateway for routing and caching AI requests
DevOps tools -- Infrastructure and deployment automation tools

§06

Common pitfalls

Workers AI model availability varies by plan. Free tier has request limits and not all models are available. Check the Cloudflare dashboard for your plan's model catalog.
Response streaming requires specific Worker syntax with TransformStream. The basic fetch pattern returns the full response at once.
Cold starts on less-popular models can add latency on the first request. Frequently-used models stay warm across Cloudflare's network.

Frequently Asked Questions

What AI models are available on Workers AI?+

Workers AI supports text generation models (Meta Llama family), embedding models (BGE, sentence-transformers), image generation (Stable Diffusion), speech-to-text (Whisper), and more. The model catalog is updated regularly on the Cloudflare developer docs.

How much does Workers AI cost?+

Workers AI includes a free tier with a limited number of neurons (Cloudflare's billing unit for AI inference). Paid plans charge per neuron consumed. Exact pricing varies by model size and is detailed on the Cloudflare pricing page.

Can I use Workers AI with my own fine-tuned models?+

Workers AI currently supports Cloudflare's curated model catalog. You cannot upload custom weights. If you need custom models, consider using Workers as a proxy to your own inference endpoint.

Does Workers AI support streaming responses?+

Yes. You can stream text generation responses token by token using Server-Sent Events. This requires using the stream option in the AI.run call and returning a readable stream from your Worker.

What regions does Workers AI run in?+

Workers AI runs on Cloudflare's global network of 300+ data centers. Requests are routed to the nearest data center that has the requested model available. GPU-accelerated inference is available in select locations.

Citations (3)

Cloudflare Workers AI Docs— Workers AI provides serverless inference on Cloudflare's edge network
Workers AI Models— Supports Meta Llama, Stable Diffusion, Whisper and embedding models
Cloudflare Network— Cloudflare operates 300+ data centers globally

Related on TokRepo

Cloudflare AI Gateway DevOps tools API tools

🙏

Source & Thanks

Created by Cloudflare.

Documentation: developers.cloudflare.com/workers-ai

Discussion

No comments yet. Be the first to share your thoughts.

Cloudflare AI Workers — Deploy AI Apps at the Edge

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

Frequently Asked Questions

Citations (3)

Related on TokRepo

Source & Thanks

Discussion

Related Assets

Conda — Cross-Platform Package and Environment Manager

Sphinx — Python Documentation Generator

Neutralinojs — Lightweight Cross-Platform Desktop Apps