TOKREPO · Arsenal IA
Nouveau · cette semaine

Stack Déploiement + Monitoring + Observabilité

Dix outils pour développeurs qui shippent en prod : cibles de déploiement (Vercel / Kamal / Coolify), error tracking, OpenTelemetry, métriques, logs, dashboards, uptime et alertes — chaînés dans un ordre délibéré pour vraiment attraper la prochaine panne.

10 ressources

What's in this pack

This is the stack a working backend engineer would assemble the week before their app gets real users — not the heroic post-outage scramble. Every pick here is open-source-first, runs on a $20 VPS or smaller, and plugs into the next tool in the chain. The order matters: each layer feeds the next.

# Pick Layer What it does
1 Vercel CLI deploy (PaaS) preview URL on every git push, zero config for Next/Nuxt/Astro
2 Kamal deploy (container) zero-downtime Docker deploys to any bare VPS — Basecamp's tool
3 Coolify deploy (self-hosted PaaS) open-source Vercel/Heroku replacement for your own server
4 Sentry errors + APM exception capture, release health, performance traces
5 OpenTelemetry Collector telemetry pipeline vendor-neutral fan-in for traces, metrics, logs
6 Prometheus metrics pull-based time-series DB, the industry default
7 Grafana Loki logs log aggregation that thinks like Prometheus — cheap, indexed by label
8 Grafana dashboards the wall display every other tool plugs into
9 Uptime Kuma uptime + status page self-hosted heartbeat that pages you when the site dies
10 Prometheus Alertmanager alert routing dedupe, group, route alerts to PagerDuty / Slack / email

Install in this order (deploy → traces → logs → metrics → uptime → alerts → dashboards)

The order is deliberate. Don't install dashboards first. Empty dashboards teach you nothing. Wire the data sources first; the dashboard is the last 10% of the work.

  1. Pick one deploy target. Vercel CLI if you're shipping a JS framework and want preview URLs on every PR. Kamal if you've outgrown Heroku-style pricing and want to own the box. Coolify if you want the Vercel UX on your own hardware. Pick one. Skip the other two.
  2. Sentry next. Errors are the single highest-signal telemetry you'll add. Five lines of SDK init and you start catching exceptions you didn't know existed. Set up release tracking from day one so you can answer "did this start with the last deploy?"
  3. OpenTelemetry Collector. Don't lock yourself to one vendor's SDK. The Collector is a single Go binary that receives OTLP from your app and fans out to Sentry, Prometheus, Loki, or anything else. Configure it once, swap backends without touching app code.
  4. Prometheus for metrics. Scrape /metrics from your app, your Node Exporter, your database exporters. The four golden signals — latency, traffic, errors, saturation — go here.
  5. Loki for logs. If you already use Prometheus, Loki is the obvious log store: same label model, same query language flavor, runs on the same VM. Don't index every JSON field; index by service, env, level — let LogQL filter the rest.
  6. Uptime Kuma for the heartbeat. External-perspective ping. Catches the outages your internal stack can't see (DNS, TLS cert, CDN). Public status page included.
  7. Alertmanager wired to Prometheus. Alerts should fire on symptoms (p95 latency > 2s, error rate > 1%), not causes (CPU > 80%). Route P1 to pager, P2 to Slack, P3 to a daily digest.
  8. Grafana last. Now that data is flowing, build three dashboards: one for the on-call engineer (latency, error rate, recent deploys), one for the product owner (signups, conversions, cost per user), one for the exec (uptime %, MAU, week-over-week). Generic dashboards get ignored.

Tradeoffs you'll hit

  • Vercel vs Kamal vs Coolify — Vercel = zero-ops, scales to zero, gets expensive at scale and you don't own the stack. Kamal = own the box, Docker is the only abstraction, cheap and predictable. Coolify = the middle ground; self-hosted UI on top of Docker. Most teams ship the MVP on Vercel, migrate to Kamal/Coolify when the bill hits $500/mo.
  • Sentry SaaS vs self-hosted — Self-hosted Sentry needs ~6 services (Kafka, Postgres, Redis, ClickHouse). For under 100k events/month, the SaaS free tier is genuinely cheaper than your time. Self-host only when you're past the free tier and have ops bandwidth.
  • Prometheus + Loki + Grafana vs Datadog — Datadog is the polished hosted incumbent. The open stack costs ~$20/mo in VPS instead of $300+/mo per host. Tradeoff: you babysit the stack. Below ~10 services, open-source wins on cost and lock-in; above ~50, Datadog's ergonomics start to matter.
  • Push vs pull metrics — Prometheus is pull (it scrapes you). If you run serverless or short-lived jobs, pull doesn't work — use a Pushgateway, or switch to OpenTelemetry push to a Collector. Don't fight the model.

Common pitfalls

  • Alerting on causes, not symptoms. "CPU > 80%" pages you at 3am for a workload that's fine. "User-facing p95 > 2s" pages you only when it matters. Tune for symptoms; investigate causes after waking up.
  • No release annotation in Grafana. Half of all incidents start "right after the deploy." Wire your deploy script to POST a Grafana annotation on every release. The flame on the timeline saves 20 minutes per incident.
  • Indexing every log field. Loki's whole point is that it doesn't. If you add 50 labels per log line, cardinality explodes and the cheap log store becomes expensive. Index by service, env, level — grep the rest.
  • One alert channel for everything. P1 (site down) → phone. P2 (degraded) → Slack with @channel. P3 (anomaly) → daily digest. Mix them and either you ignore the pager or you ignore the digest. Both fail.
  • No external uptime check. Your internal Prometheus thinks the service is up. Cloudflare or your CDN is dropping 30% of requests in eu-west. Uptime Kuma from a different network catches this. Five minutes to set up.
INSTALLER · UNE COMMANDE
$ tokrepo install pack/deploy-monitor-observability
passez-la à votre agent — ou collez-la dans votre terminal
Ce qu'il contient

10 ressources prêtes à installer

Script#01
Vercel CLI — Preview Deployments from Terminal

Vercel CLI runs dev servers, pulls project env, and creates preview or production deployments from the terminal. Useful for agent-built web changes.

by Vercel·73 views
$ tokrepo install vercel-cli-preview-deployments-from-terminal
Skill#02
Kamal — Zero-Downtime Docker Deploys to Any Server

Kamal is Basecamp's deploy tool that ships Docker containers to bare metal or cloud VMs with a single command, giving you Heroku-like workflows on servers you actually own.

by Script Depot·121 views
$ tokrepo install kamal-zero-downtime-docker-deploys-any-server-5211d45c
Skill#03
Coolify — Self-Hosted Vercel & Netlify Alternative

Deploy apps, databases, and services on your own server with one click. No vendor lock-in. 52K+ GitHub stars.

by AI Open Source·152 views
$ tokrepo install coolify-self-hosted-vercel-netlify-alternative-202dfab1
Skill#04
Sentry — Open Source Error Tracking & Performance Monitoring

Sentry is the developer-first error tracking and performance monitoring platform. Capture exceptions, trace performance issues, and debug production errors across all languages.

by AI Open Source·173 views
$ tokrepo install sentry-open-source-error-tracking-performance-monitoring-ece57add
Skill#05
OpenTelemetry Collector — Vendor-Neutral Telemetry Pipeline

The OpenTelemetry Collector is the CNCF-graduated pipeline for receiving, processing, and exporting metrics, logs, and traces across any observability backend, replacing per-vendor agents with one portable binary.

by AI Open Source·130 views
$ tokrepo install opentelemetry-collector-vendor-neutral-telemetry-pipeline-1e161adc
Skill#06
Prometheus — Open Source Monitoring & Alerting Toolkit

Prometheus is the CNCF-graduated monitoring system and time series database. Pull-based metrics collection, powerful PromQL queries, and built-in alerting for cloud-native infrastructure.

by AI Open Source·135 views
$ tokrepo install prometheus-open-source-monitoring-alerting-toolkit-ed3a8de4
Skill#07
Grafana Loki — Prometheus-Inspired Log Aggregation System

Loki is a horizontally scalable, multi-tenant log aggregation system by Grafana Labs. Unlike other log systems, Loki indexes metadata about logs, not log content itself.

by Grafana Labs·209 views
$ tokrepo install grafana-loki-prometheus-inspired-log-aggregation-system-92fa7c1f
Skill#08
Grafana — Open Source Data Visualization & Observability

Grafana is the leading open-source platform for monitoring and observability. Visualize metrics, logs, and traces from Prometheus, Loki, Elasticsearch, and 100+ data sources.

by Grafana Labs·193 views
$ tokrepo install grafana-open-source-data-visualization-observability-ed1a524f
MCP#09
Uptime Kuma — Self-Hosted Uptime Monitoring

Monitor HTTP, TCP, DNS, Docker services with notifications to 90+ channels. Beautiful dashboard. 84K+ GitHub stars.

by MCP Hub·170 views
$ tokrepo install uptime-kuma-self-hosted-uptime-monitoring-88e260be
Skill#10
Prometheus Alertmanager — Alert Routing and Notification Hub

Alertmanager handles alerts sent by Prometheus, deduplicating, grouping, and routing them to the right notification channel such as email, Slack, PagerDuty, or webhooks.

by Script Depot·133 views
$ tokrepo install prometheus-alertmanager-alert-routing-notification-hub-51f92d7e
Questions fréquentes

Questions fréquentes

Do I really need all ten of these? It looks like a lot.

You need one from each layer, not all ten. The pack lists alternatives within layers (three deploy targets, two metric paths via Prometheus or OTel) — pick the one that fits your scale. The minimum viable stack for a 1-person indie ship is: Vercel CLI + Sentry + Uptime Kuma. Add Prometheus + Grafana + Alertmanager when you have a second engineer. Add Loki + OpenTelemetry Collector when you're past 10 services. Don't install ahead of need.

What's the realistic monthly cost for this whole stack?

For a small team: Vercel free or $20/mo, Sentry free tier (5k errors/mo) or $26/mo, then a single $5-20 VPS to host Prometheus + Loki + Grafana + Uptime Kuma + Alertmanager together (they're all light on RAM). Total: $25-60/mo for production observability that catches real outages. Compare to Datadog at $15-31 per host per month, often $300+/mo for the same coverage.

How does this overlap with the LLM Observability pack?

LLM Observability (Langfuse, Phoenix, AgentOps) is the application-semantic layer — prompt traces, token costs, eval scores. This Deploy + Monitor + Observability pack is the infrastructure layer — is the container alive, is the HTTP p95 acceptable, did the deploy break the error rate. You want both. The OpenTelemetry Collector in this pack can ingest LLM traces from Langfuse/Phoenix and forward them alongside infra metrics, so on-call sees both on one Grafana dashboard.

Why Kamal over Docker Swarm or Nomad?

Kamal is opinionated to the point of being boring, which is what you want for deploys. It only does zero-downtime container rollouts and traefik-based routing — no scheduler, no service mesh, no YAML cathedral. For 1-10 servers it's the simplest thing that works. Swarm is in maintenance mode; Nomad is great but the operational footprint is larger than a small team needs. Reach for k8s only when you have someone whose full-time job is k8s.

Can I use this stack with a serverless backend (AWS Lambda, Cloudflare Workers)?

Yes, but the scrape model breaks. For serverless, use OpenTelemetry SDKs that push traces and metrics to the OpenTelemetry Collector via OTLP. The Collector then writes to Prometheus (via remote_write) and Loki, and everything else in the pack works unchanged. Uptime Kuma still pings the public URL, Sentry's SDK works in Lambda/Workers runtimes, and Grafana dashboards don't care where the data came from.

PLUS DANS L'ARSENAL

12 packs · 80+ ressources sélectionnées

Découvrez tous les packs curatés sur la page d'accueil

Retour à tous les packs