Esta página se muestra en inglés. Una traducción al español está en curso.

SkillsMar 29, 2026·1 min de lectura

Cloudflare Workers AI — Serverless AI Inference

Run AI models at the edge with Cloudflare Workers. Text generation, image generation, speech-to-text, translation, embeddings — all serverless with global distribution.

Cloudflare · Community

Listo para agents

Instalación con revisión previa

Este activo requiere revisión. El prompt copiado pide dry-run, muestra escrituras y continúa solo tras confirmación.

Needs Confirmation · 64/100Política: confirmar

Superficie agent

Cualquier agent MCP/CLI

Tipo

Skill

Instalación

Single

Confianza

Confianza: Community

Entrada

Cloudflare Workers AI — Serverless AI Inference

Comando con revisión previa

npx -y tokrepo@latest install 422d0627-e9a9-4dd2-80a0-859e0dc25edf --target codex

Primero dry-run, confirma las escrituras y luego ejecuta este comando.

TL;DR

Cloudflare Workers AI runs AI models serverlessly at the edge with global distribution and no GPUs to manage.

§01

What it is

Cloudflare Workers AI lets you run AI models at the edge with zero infrastructure management. It supports text generation, image generation, speech-to-text, translation, embeddings, and more -- all as serverless function calls within Cloudflare Workers. Models run on Cloudflare's global GPU network, so inference happens close to your users.

Cloudflare Workers AI is for developers who want to add AI capabilities to their applications without provisioning GPUs, managing model serving infrastructure, or dealing with cold starts.

The project is actively maintained with regular releases and a growing user community. Documentation covers common use cases, and the open-source nature means you can inspect the source code, contribute fixes, and adapt the tool to your specific requirements.

§02

How it saves time or tokens

Self-hosting AI models requires GPU servers, model loading, scaling logic, and monitoring. Cloud GPU APIs (like AWS SageMaker) require provisioning and incur idle costs. Workers AI is fully serverless: you pay per inference request with no idle costs, no cold starts, and no infrastructure to manage. Deployment is one command with wrangler.

§03

How to use

Create a Cloudflare Workers project with wrangler.
Call env.AI.run() with a model name and input data.
Deploy with npx wrangler deploy.

§04

Example

// src/index.js
export default {
  async fetch(request, env) {
    const response = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
      messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: 'What is Cloudflare Workers?' }
      ]
    });
    return new Response(JSON.stringify(response));
  }
};

# Create and deploy
npx wrangler init my-ai-app
cd my-ai-app
npx wrangler deploy

§05

Related on TokRepo

AI Gateway: Cloudflare -- Cloudflare AI Gateway for routing and observability
AI Tools for API -- API development and inference tools

§06

Common pitfalls

Workers AI has model-specific token limits. Large prompts that exceed the model's context window are silently truncated. Check the model card for limits before sending requests.
The free tier has daily request limits. Monitor usage in the Cloudflare dashboard to avoid hitting rate limits during development.
Not all models are available in all regions. Some models may have higher latency depending on which Cloudflare data center handles the request.

Before adopting this tool, evaluate whether it fits your team's existing workflow. Read the official documentation thoroughly, and start with a small proof-of-concept rather than a full migration. Community forums, GitHub issues, and Stack Overflow are valuable resources when you encounter edge cases not covered in the documentation.

Preguntas frecuentes

Which AI models are available on Workers AI?+

Workers AI offers LLaMA 3, Mistral, Stable Diffusion, Whisper (speech-to-text), translation models, and embedding models. The catalog grows regularly. Check the Cloudflare Workers AI model page for the current list.

How is Workers AI priced?+

Workers AI offers a free tier with daily request limits. Paid usage is billed per inference request based on the model and input size. There are no idle costs or GPU provisioning fees.

Can I use Workers AI with existing Cloudflare Workers?+

Yes. Workers AI is accessed through the env.AI binding in any Cloudflare Worker. Add the AI binding to your wrangler.toml and call env.AI.run() in your worker code.

Does Workers AI support streaming responses?+

Yes. Workers AI supports streaming for text generation models. Use the stream option to receive tokens as they are generated, enabling real-time chat experiences.

How does Workers AI compare to OpenAI API?+

OpenAI API provides access to GPT-4 and other proprietary models. Workers AI runs open-source models on Cloudflare's edge network. Workers AI has lower latency for global users, no API key management, and runs within the Cloudflare ecosystem.

Referencias (3)

Cloudflare Workers AI— Run AI models at the edge with Cloudflare Workers
Cloudflare Blog— Serverless AI inference with no GPU management
Workers AI Models— Model catalog and capabilities

Relacionados en TokRepo

Cloudflare AI Gateway AI API tools Featured workflows

🙏

Fuente y agradecimientos

Created by Cloudflare. Cloudflare Workers AI

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

Cloudflare AI Workers — Deploy AI Apps at the Edge

Run AI models on Cloudflare's global edge network. Workers AI provides serverless inference for LLMs, embeddings, image generation, and speech-to-text at low latency.

Skills

Cloudflare

workerd — Cloudflare Workers JavaScript Runtime

workerd is the open-source JavaScript and WebAssembly runtime that powers Cloudflare Workers. It can be self-hosted to run Workers-compatible code locally or on your own infrastructure with the same V8 isolate-based execution model.

Skills

AI Open Source

Cloudflare Agents — Stateful Agents on Durable Objects

Cloudflare Agents provides stateful execution environments for agent workloads on Durable Objects, with scheduling, realtime, MCP, and Workers deployment.

Skills

Cloudflare

Modal — Serverless GPU Cloud for AI Workloads

Run GPU workloads in the cloud with Python decorators. Modal provides serverless A100/H100 GPUs for model inference, fine-tuning, and batch jobs with zero infrastructure.

Skills

Modal