Skills2026年4月7日·1 分钟阅读

Cloudflare AI Workers — Deploy AI Apps at the Edge

Run AI models on Cloudflare's global edge network. Workers AI provides serverless inference for LLMs, embeddings, image generation, and speech-to-text at low latency.

Agent 就绪

先审查再安装

这个资产需要先审查。复制的指令会要求 Agent dry-run、列出写入项,确认后再继续。

Needs Confirmation · 64/100策略:需确认
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Community
入口
Cloudflare AI Workers — Deploy AI Apps at the Edge
先审查命令
npx -y tokrepo@latest install bd8d0961-db4e-4890-828e-095163614679 --target codex

先 dry-run,确认写入项后再运行此命令。

TL;DR
Workers AI runs LLMs, embeddings, and image models on Cloudflare's edge network with zero infrastructure management.
§01

What it is

Cloudflare AI Workers lets you run AI inference on Cloudflare's global edge network. Workers AI provides serverless access to models including Meta Llama, Stable Diffusion, Whisper, and embedding models. You write a few lines of TypeScript, deploy to Cloudflare, and your AI app runs close to users across 300+ data centers.

This workflow is for developers who want low-latency AI inference without provisioning GPUs or managing model serving infrastructure. It suits prototyping, production APIs, and edge-first applications where latency matters.

§02

How it saves time or tokens

The workflow provides a ready-to-deploy project scaffold. Instead of configuring model endpoints, GPU instances, or container orchestration, you get a single npm create command that generates a working Cloudflare Worker with AI bindings. Deployment is one command away, and Cloudflare handles scaling, routing, and model hosting.

§03

How to use

  1. Scaffold a new Cloudflare AI project:
npm create cloudflare@latest my-ai-app
cd my-ai-app
  1. Write your Worker with AI bindings:
export default {
  async fetch(request, env) {
    const response = await env.AI.run(
      '@cf/meta/llama-3.1-8b-instruct',
      {
        messages: [
          { role: 'user', content: 'Explain edge computing in one paragraph' }
        ]
      }
    );
    return new Response(JSON.stringify(response));
  }
};
  1. Deploy to Cloudflare's edge:
npx wrangler deploy
§04

Example

// Text embeddings for semantic search
export default {
  async fetch(request, env) {
    const text = 'How do I deploy an AI model?';
    const embeddings = await env.AI.run(
      '@cf/baai/bge-base-en-v1.5',
      { text: [text] }
    );
    return Response.json({ embeddings: embeddings.data });
  }
};
§05

Related on TokRepo

§06

Common pitfalls

  • Workers AI model availability varies by plan. Free tier has request limits and not all models are available. Check the Cloudflare dashboard for your plan's model catalog.
  • Response streaming requires specific Worker syntax with TransformStream. The basic fetch pattern returns the full response at once.
  • Cold starts on less-popular models can add latency on the first request. Frequently-used models stay warm across Cloudflare's network.

常见问题

What AI models are available on Workers AI?+

Workers AI supports text generation models (Meta Llama family), embedding models (BGE, sentence-transformers), image generation (Stable Diffusion), speech-to-text (Whisper), and more. The model catalog is updated regularly on the Cloudflare developer docs.

How much does Workers AI cost?+

Workers AI includes a free tier with a limited number of neurons (Cloudflare's billing unit for AI inference). Paid plans charge per neuron consumed. Exact pricing varies by model size and is detailed on the Cloudflare pricing page.

Can I use Workers AI with my own fine-tuned models?+

Workers AI currently supports Cloudflare's curated model catalog. You cannot upload custom weights. If you need custom models, consider using Workers as a proxy to your own inference endpoint.

Does Workers AI support streaming responses?+

Yes. You can stream text generation responses token by token using Server-Sent Events. This requires using the stream option in the AI.run call and returning a readable stream from your Worker.

What regions does Workers AI run in?+

Workers AI runs on Cloudflare's global network of 300+ data centers. Requests are routed to the nearest data center that has the requested model available. GPU-accelerated inference is available in select locations.

引用来源 (3)
🙏

来源与感谢

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产