TOKREPO · Arsenal de IA
Nuevo · esta semana

Plantillas de Despliegue de Agentes

Diez selecciones para devs llevando un agente IA a producción: esqueletos FastAPI (Agno, PydanticAI), destinos serverless (Modal, Replicate), runtimes de sandbox (E2B, Daytona), state store y cola (Upstash, LangGraph), y un destino Kubernetes — encadenados en orden para que el agente sobreviva las primeras 1.000 peticiones reales.

10 recursos

What's in this pack

This is the stack a working engineer would assemble the week before shipping an AI agent to real users — not the heroic post-launch scramble when the first OOM kill takes the service down. Every pick here is a deployment template in the literal sense: clone a repo, set a few env vars, and you have an agent that handles concurrent requests, persists state, sandboxes untrusted code, and recovers from process death. Open-source-first, runs cheaply, and each layer plugs into the next.

# Pick Layer What it does
1 Agno agent skeleton (FastAPI) Production agent runtime with FastAPI serving, sessions, integrations
2 PydanticAI agent skeleton (typed) Type-safe agent framework — Pydantic models as the I/O contract
3 Modal Sandboxes serverless sandbox Run agent-generated code in isolated cloud sandboxes
4 modal-examples serverless template Reference repo for serverless LLM jobs on Modal
5 Replicate Cog serverless template (container) One YAML file → containerized model with HTTP + webhook API
6 E2B sandbox runtime Secure cloud sandboxes for AI-generated code — Python/JS SDK
7 Daytona SDK sandbox runtime Programmable dev sandboxes — snapshots, reproducible workspaces
8 Upstash (Redis + Kafka) state store + queue Serverless Redis for sessions, Kafka for the work queue
9 LangGraph stateful agent graphs Build agents as graphs with explicit state + checkpoints
10 Agent Sandbox on Kubernetes deploy target Pattern + manifests for running agents safely on a k8s cluster

Install in this order (skeleton → state → sandbox → queue → deploy target)

The order is deliberate. Don't pick a deploy target first. You'll end up rewriting the agent to fit the platform's quirks. Get the skeleton, state, and sandbox right locally; the deploy target is the last decision.

  1. Pick one agent skeleton. If you want a batteries-included runtime with FastAPI serving, sessions, and tracing already wired, pick Agno. If you want a smaller, type-first surface where Pydantic models are the I/O contract and you assemble the HTTP layer yourself, pick PydanticAI. Either way, the goal is a /run endpoint that accepts a request and returns a typed response. Build this locally first.
  2. Add a state store before you add tools. As soon as an agent has a session, you need somewhere to put it — pick Upstash Redis (serverless, pay-per-request, no idle cost) for session/cache and Upstash Kafka (or any managed queue) for the work queue if turns can take more than 30 seconds. Don't write "state" to local disk; the next pod restart erases it.
  3. Wrap untrusted tool calls in a sandbox. The moment your agent executes generated code, runs shell commands, or browses the web, you need isolation. E2B is the lowest-friction choice — from e2b import Sandbox; sbx.run_code(...) and you're done. Daytona SDK is the alternative when you need persistent, snapshot-able dev workspaces (e.g., long-lived coding agents). Modal Sandboxes is the same primitive co-located with Modal compute, which matters if you're also deploying on Modal.
  4. Pick a serverless template if requests are bursty. Modal (and modal-examples as the reference repo) gives you a Python decorator that becomes an HTTP endpoint with GPU access, scale-to-zero, and per-second billing — ideal for agents whose requests arrive in clusters. Replicate Cog packages a model + handler as one container with cog.yaml; great if you also serve a model and want a single deploy artifact.
  5. For long-running or stateful flows, use LangGraph. When the agent is a multi-step graph (plan → search → reflect → answer) with branches and human-in-the-loop, LangGraph gives you explicit state + checkpoints — meaning a crashed turn can resume instead of restarting. Pair its checkpointer with your Redis from step 2.
  6. Pick the deploy target last. Three realistic paths: (a) Serverless — wrap the FastAPI app in a Modal @asgi_app, or cog predict for Replicate, ship as a container. (b) PaaS — push the FastAPI app to Fly/Render/Railway behind a simple Dockerfile; cheapest for steady traffic. (c) Kubernetes — when you need multi-tenancy, gVisor isolation, or you've outgrown the PaaS box, use Agent Sandbox as the reference pattern for running agents safely on k8s (pod-per-session, sandbox per tool call, network policies that deny by default).

How the pieces fit

[client]
   │  HTTP /run
   ▼
[FastAPI agent skeleton]  ← Agno or PydanticAI
   │
   ├─ session/cache  ──▶  Upstash Redis
   │
   ├─ background work ──▶  Upstash Kafka  ──▶  worker (same image)
   │
   ├─ tool: run_code ──▶  E2B / Daytona / Modal Sandbox
   │
   ├─ graph state    ──▶  LangGraph checkpointer (Redis)
   │
   ▼
[deploy target]  ── Modal @asgi_app  /  Replicate Cog  /  k8s (Agent Sandbox)

The four-tool combo agent skeleton + state store + sandbox + deploy target is the minimum viable production agent. Skip any one and you'll feel it within a week: no state → users hate the amnesia, no sandbox → an rm -rf from a hallucinated tool call ruins your day, no skeleton → you reinvent FastAPI middleware badly, no deploy target → you can't actually serve traffic.

Tradeoffs you'll hit

  • Agno vs PydanticAI — Agno is the bigger framework with sessions, FastAPI app, integrations, tracing already wired; the cost is opinions you have to live with. PydanticAI is smaller and type-first; you bring the HTTP layer. For a team shipping in two weeks: Agno. For a team that already has FastAPI conventions: PydanticAI.
  • E2B vs Daytona vs Modal Sandbox — E2B is the fastest to integrate for ephemeral code execution (Python SDK, secure by default). Daytona shines when you need persistent, snapshot-able workspaces (long-lived coding agents). Modal Sandbox is the right pick if your compute already lives on Modal — same auth, same billing, lower latency to your model calls.
  • Modal vs Replicate Cog vs k8s — Modal scales to zero, bills per-second, and treats Python as the deploy unit; ideal for bursty agent traffic. Replicate Cog is one container with cog.yaml; ideal when you also serve a model with the agent. Kubernetes (via Agent Sandbox patterns) is the right answer when you need real multi-tenancy, gVisor-level isolation, or you've outgrown the managed-platform box.
  • LangGraph vs handwritten state machine — For one-shot agents (single LLM call + tool), don't pull in LangGraph; it's overhead. For multi-turn graphs with branches, retries, and human-in-the-loop, LangGraph's checkpointer earns its weight by making crashes resumable.
  • Upstash Redis vs self-hosted Redis — Upstash is serverless and pay-per-request; great until you exceed ~10M commands/month, where a $20 Redis VM gets cheaper. The migration is one URL change. Don't optimize early.

Common pitfalls

  • Writing session state to local disk or memory. The next pod restart erases it and users blame you for amnesia. State goes to Redis (or your DB) from day one, not after the first incident.
  • Tool calls without a sandbox. The first time an agent hallucinates subprocess.run(['rm', '-rf', '/']) and your service runs it, you've lost the production cluster. E2B/Daytona/Modal Sandbox is not optional once tools include shell or code execution.
  • Serverless for long agent turns. Most serverless platforms have a max execution time (Lambda 15min, Vercel 5min, Modal up to 24h). If your agent turn can take 30+ minutes, either pick Modal (long timeouts) or push the work to a queue and let the HTTP request return a job ID.
  • No request-level timeout in the FastAPI skeleton. Without a timeout, one hung LLM call exhausts your worker pool. Set explicit timeouts at the HTTP boundary, at the LLM client, and at the tool call — three layers.
  • Logging the full prompt + response. It feels useful in dev. In prod it leaks PII into log aggregators that aren't compliant. Truncate, redact, or sample before logging — and pair with LLM-observability tooling (Langfuse, Phoenix) for full traces under access control.
INSTALAR · UN COMANDO
$ tokrepo install pack/agent-deployment-templates
pásalo a tu agente — o pégalo en tu terminal
Qué incluye

10 recursos listos para instalar

Skill#01
Agno — Production AI Agent Runtime

Agno is a runtime for building and managing agentic software at scale. 39.1K+ GitHub stars. Stateful agents, FastAPI serving, 100+ integrations, tracing. Apache 2.0.

by Agno·145 views
$ tokrepo install agno-production-ai-agent-runtime-f73bc89d
Skill#02
PydanticAI — Type-Safe AI Agent Framework

Build production-grade AI agents with type safety, structured outputs, and multi-model support. By the creators of Pydantic and FastAPI.

by Pydantic·99 views
$ tokrepo install pydanticai-type-safe-ai-agent-framework-0313bf39
Agent#03
Modal Sandboxes — Secure Cloud Code Execution for AI Agents

Modal Sandboxes spin up secure Linux environments for agent-generated code in seconds. Custom images, GPUs, persistent volumes from any Modal Function.

by Modal·72 views
$ tokrepo install modal-sandboxes-secure-cloud-code-execution-for-ai-agents
Skill#04
modal-examples — Serverless LLM Jobs on Modal

Learn production patterns for serverless jobs (LLM inference, data pipelines) using Modal’s official examples. Run one and adapt it to your workload.

by Script Depot·40 views
$ tokrepo install modal-examples-serverless-llm-jobs-on-modal
Skill#05
Replicate Cog — Containerize ML Models with One YAML File

Cog is Replicate's open-source tool to wrap an ML model in a Docker container. One cog.yaml + predict.py gives you a portable, GPU-aware HTTP model.

by Replicate·44 views
$ tokrepo install replicate-cog-containerize-ml-models-with-one-yaml-file
Skill#06
E2B — Secure Sandboxes for AI Code

E2B runs AI-generated code in isolated cloud sandboxes. Install the Python/JS SDK, set `E2B_API_KEY`, then execute commands safely inside a sandbox.

by Agent Toolkit·93 views
$ tokrepo install e2b-secure-sandboxes-for-ai-code
Skill#07
Daytona SDK — Programmable Dev Sandboxes for AI Agents

Daytona SDK spawns Linux dev environments in 90 ms. Run agent-generated code, browser automation, ML jobs. Snapshot + fork to branch execution.

by Daytona·78 views
$ tokrepo install daytona-sdk-programmable-dev-sandboxes-for-ai-agents
MCP#08
Upstash MCP — Serverless Redis & Kafka for AI Agents

MCP server for Upstash serverless Redis and Kafka. Give AI agents access to caching, rate limiting, pub/sub, and message queues with zero infrastructure. Pay-per-request pricing. 2,000+ stars.

by MCP Hub·112 views
$ tokrepo install upstash-mcp-serverless-redis-kafka-ai-agents-e0ed3953
Skill#09
LangGraph — Build Stateful AI Agents as Graphs

LangChain framework for building resilient, stateful AI agents as graphs. Supports cycles, branching, persistence, human-in-the-loop, and streaming. 28K+ stars.

by LangChain·452 views
$ tokrepo install langgraph-build-stateful-ai-agents-graphs-cc1a6ed2
Skill#10
Agent Sandbox — Run Agents Safely on Kubernetes

Agent Sandbox provides Kubernetes-first guardrails for agent workloads: resource limits, isolation, and repeatable environments so failures stay contained.

by AI Open Source·59 views
$ tokrepo install agent-sandbox-run-agents-safely-on-kubernetes
Preguntas frecuentes

Preguntas frecuentes

Do I really need all ten of these? It looks like a lot.

You need one from each layer, not all ten. The pack lists alternatives within layers (two skeletons, three sandbox runtimes, three deploy paths) so you can pick what fits your scale. The minimum viable production agent for a solo dev is: Agno (skeleton) + Upstash Redis (state) + E2B (sandbox) + Modal (deploy) — four picks, deploys in an afternoon. Add LangGraph when the agent grows into a multi-step graph. Add Agent Sandbox on Kubernetes when you outgrow the managed platform box.

What does a realistic monthly bill look like for this stack?

For a small agent serving a few thousand requests a day: Modal ~$5-50/mo (pay-per-second compute, scales to zero), Upstash Redis free tier or ~$10/mo, E2B free tier (100h/mo) or ~$30/mo for steady use, no charge for the open-source skeleton or LangGraph. Total: $15-100/mo end-to-end. The variable is LLM cost, which usually dwarfs infra; the picks here are designed so infra stays a rounding error relative to model spend.

How does this overlap with the LLM Observability pack?

Different layers. This pack covers deployment — how the agent process exists, persists, and serves traffic. LLM Observability (Langfuse, Phoenix, AgentOps) covers prompts, traces, and eval scores — the application-semantic layer. Wire both. The agent skeleton emits OpenTelemetry from the start; the observability stack ingests it. Most teams add this pack first (you can't observe an agent you can't deploy) and the observability pack the same week.

Why pick E2B over Modal Sandboxes if I'm already on Modal?

If your compute is already on Modal, Modal Sandboxes is the right pick — same auth, same billing, lower latency to your model calls, no extra vendor. E2B wins when you're deploying on a different target (Fly, Replicate, k8s) and want a sandbox that doesn't drag a second cloud account behind it. Daytona wins when sandboxes need to live for hours/days (persistent dev workspaces for coding agents) rather than seconds.

Can I use this stack for a long-running, multi-step research agent (not a chat agent)?

Yes — that's actually the case the stack is shaped for. Use LangGraph for the graph with checkpoints (so a crash mid-turn resumes), put the checkpointer state in Upstash Redis, push each long step into Upstash Kafka so it runs in a worker pod, sandbox any tool that executes code in E2B or Daytona, and deploy on Modal (long timeouts) or Kubernetes via Agent Sandbox if you need real multi-tenancy. The HTTP /run endpoint returns a job ID immediately; clients poll or subscribe for results.

MÁS DEL ARSENAL

12 packs · 80+ recursos seleccionados

Explora todos los packs curados en la página principal

Volver a todos los packs