TOKREPO · Arsenal IA

Stable

Pack Analyse et Recherche de Logs

Dix choix pour l'ingénieur qui lit des logs à 3h du matin — loggers structurés, stack d'envoi et stockage (Fluent Bit → Loki / Elasticsearch / ClickHouse), tail SQL local avec lnav, groupage d'erreurs Sentry, et serveurs MCP pour que l'agent IA interroge traces et alertes directement.

10 ressources

À propos de ce pack

What this pack solves

It's 3 a.m. The pager says 5xx rate jumped. You SSH in, tail -f a file that's already rotated, grep for an exception that's actually three exceptions sharing a substring, and 40 minutes later you've narrowed it to "something in checkout." That's the problem this pack kills.

The goal isn't observability theatre — no fifteen dashboards no one opens. The goal is: structured logs go in one end, a question comes out the other, and the question can be asked by you, a teammate, or an AI agent with MCP access.

Every pick here is open-source or has a self-hostable open-source core. The full pipeline runs on a single mid-size VM up to about 50 GB/day of log volume; past that, you split Loki/ClickHouse onto their own boxes. No vendor lock-in, no per-GB pricing surprises.

Install in this order

winston (Node) or Loguru (Python) — start with structured logging in your app. JSON output, one log line per event, every line has timestamp, level, service, trace_id. If your logs aren't structured at the source, every downstream tool is fighting your formatter instead of doing its job.
Fluent Bit — the shipper. Tails files / journald / Docker logs, parses JSON, adds host labels, batches, retries, ships to your store. Tiny C binary, ~5 MB RSS, runs as a sidecar or DaemonSet. The non-negotiable middle layer.
Grafana Loki — the store, default pick. Indexes labels (not full text), uses object storage, cheap to run. Best when you ship structured JSON and search by service=checkout level=error. LogQL feels like PromQL — five minutes to learn if you know Prometheus.
Elasticsearch — alternative store when you need full-text search across log message bodies, not just labels. Heavier (JVM, more disk) but unbeatable when the question is "find every log mentioning OrderId=abc-123 anywhere". Pair with Kibana for the UI.
ClickHouse — alternative store when you have a lot of logs (>100 GB/day) and want SQL. Columnar, eats compressed JSON for breakfast, queries that take Elasticsearch 30 seconds run in 1 second. The right pick at scale.
lnav — the local terminal log navigator. SQL queries against log files directly, live tailing, format auto-detection, syntax-highlighted error highlighting. The tool you reach for when SSH'd to one box and the centralized store isn't relevant. Single binary, no daemon.
Sentry — error grouping + alerting. Different from Loki/ES/CH — those store all logs; Sentry catches exceptions and stack traces, groups duplicates intelligently, sends an alert when a new error appears or volume spikes. Self-hostable.
SigNoz MCP Server — Model Context Protocol bridge. Lets Claude / ChatGPT / Cursor query SigNoz's traces, logs, and alerts conversationally. "What's the slowest endpoint in the last hour?" → real answer from real data, not hallucinated.
ClickHouse MCP — the safer MCP pick when your store is ClickHouse. Read-only by default, drop-table protection, parameterized queries. Hand it to an agent without panicking that it'll DROP DATABASE production.

How the pipeline fits together

[ your app ]
     │
     ▼  winston / Loguru   (structured JSON to stdout)
     │
[ Fluent Bit ]   (parses, labels, batches)
     │
     ├──▶ Loki           ← cheap, label-indexed
     ├──▶ Elasticsearch  ← full-text-heavy queries
     └──▶ ClickHouse     ← high-volume SQL analytics
     │
     ├──▶ Sentry         ← errors only, grouped + alerted
     │
     ▼  read paths:
        - lnav            (local file, no daemon)
        - Grafana         (Loki UI)
        - Kibana          (ES UI)
        - SigNoz MCP      (AI agent → traces/logs/alerts)
        - ClickHouse MCP  (AI agent → SQL, read-only)

The critical insight: pick one store, not all three. Loki is the right default for 80% of teams. Move to Elasticsearch only if full-text search across message bodies is a daily need. Move to ClickHouse only when log volume + query latency push you off Loki. The pack lists all three because the right answer depends on your traffic shape — not because you should install all three.

Tradeoffs you'll hit

Loki vs Elasticsearch vs ClickHouse — Loki is cheapest to run and easiest to operate, but its full-text search is genuinely weak (substring matches across millions of lines are slow). Elasticsearch is the opposite: heavy to run, brilliant at "find this string anywhere." ClickHouse is the SQL nuclear option — incredibly fast at aggregations but you write SQL, not LogQL/KQL. Pick the one whose tradeoff matches your usual question.
winston vs Loguru vs pino vs zap — winston is the Node default but pino is faster (and the pino ecosystem has caught up). Loguru is the Python default but structlog is more flexible if you have complex context binding. This pack picks the defaults; switch later if you hit a real limit.
Sentry vs the log store — Sentry overlaps with your log store on error capture. Worth running both: Sentry for the "new error appeared, page on-call" loop; the log store for the "reconstruct the request sequence" loop. They're different jobs.
MCP server vs custom agent tools — MCP standardizes how agents call your tools, so any MCP-aware client (Claude Desktop, Cursor, ChatGPT custom GPTs) can use the same SigNoz/ClickHouse access. Custom OpenAI function-calling is more flexible per-agent but doesn't port. MCP wins for any tool you'll expose to more than one agent runtime.

Common pitfalls

Logging strings instead of structured fields — log.info("user " + userId + " failed") is unsearchable. log.info({ event: "login_failed", userId }) is queryable in any of the stores. This is the single change that makes 80% of the rest of the stack worthwhile.
Fluent Bit without flow control — under burst load, Fluent Bit's tail input can OOM. Set Mem_Buf_Limit and enable file-based buffering before you discover this in production.
Loki labels with high cardinality — never label by user_id, request_id, trace_id. Loki's storage cost is linear in unique label-set count; one accidental high-cardinality label can 100x your bill. Keep labels to service, env, level, host.
Sentry sample rate at 100% — fine until your background job spams the same error 50k times in 10 minutes and you hit your quota. Use the SDK's before_send to deduplicate aggressive loops at the source.
MCP server exposed to read-write by default — every MCP server doc shows the read-write example first. For ClickHouse MCP specifically, the read-only mode (set in env) is the only safe default when an agent is on the other end. Audit the config.
Indexing log messages as schema — ES/CH will let every JSON field become a column or mapping. Six months later you have 12,000 fields, half of them typos from one buggy service. Normalize event names and field names at the logger, not at the store.

INSTALLER · UNE COMMANDE

$ tokrepo install pack/log-analysis-search

passez-la à votre agent — ou collez-la dans votre terminal

Ce qu'il contient

10 ressources prêtes à installer

Skill#01

winston — Versatile Logging Library for Node.js

winston is the most popular logging library for Node.js, offering multiple transports, structured JSON output, and configurable log levels for production applications.

by Script Depot·178 views

$ tokrepo install winston-versatile-logging-library-node-js-17e7e031

Skill#02

Loguru — Python Logging Made Stupidly Simple

Loguru replaces Python logging boilerplate with a single import. No handlers, no formatters, no config files — just logger.info(). It adds colorized output, structured context, file rotation, and exception diagnosis out of the box.

by Script Depot·199 views

$ tokrepo install loguru-python-logging-made-stupidly-simple-6922366e

Skill#03

Fluent Bit — Lightweight High-Performance Log and Metrics Processor

Fluent Bit is a fast, lightweight telemetry agent from the Fluentd family. It collects logs, metrics and traces from any source, processes them with filters, and forwards them to dozens of backends.

by AI Open Source·240 views

$ tokrepo install fluent-bit-lightweight-high-performance-log-metrics-18438936

Skill#04

Grafana Loki — Prometheus-Inspired Log Aggregation System

Loki is a horizontally scalable, multi-tenant log aggregation system by Grafana Labs. Unlike other log systems, Loki indexes metadata about logs, not log content itself.

by Grafana Labs·382 views

$ tokrepo install grafana-loki-prometheus-inspired-log-aggregation-system-92fa7c1f

Skill#05

Elasticsearch — Distributed Search and Analytics Engine

Elasticsearch is the most popular search and analytics engine. It provides near-real-time full-text search, structured search, analytics, and logging across petabytes of data — powering search for Wikipedia, GitHub, Stack Overflow, and millions of applications.

by Script Depot·274 views

$ tokrepo install elasticsearch-distributed-search-analytics-engine-8cbbd0e8

Config#06

ClickHouse — Open Source Real-Time Analytics Database

ClickHouse is a lightning-fast, open-source column-oriented database for real-time analytics. Query billions of rows in milliseconds with SQL. Used by Cloudflare, Uber, eBay.

by AI Open Source·169 views

$ tokrepo install clickhouse-open-source-real-time-analytics-database-2fce985b

Skill#07

lnav — The Logfile Navigator with SQL and Live Tailing

lnav is an advanced log file viewer that understands dozens of log formats, provides SQL queries against log records, live-tails rotating files, and timestamps-merges multiple logs into one view.

by Script Depot·193 views

$ tokrepo install lnav-logfile-navigator-sql-live-tailing-4493f997

Skill#08

Sentry — Open Source Error Tracking & Performance Monitoring

Sentry is the developer-first error tracking and performance monitoring platform. Capture exceptions, trace performance issues, and debug production errors across all languages.

by AI Open Source·316 views

$ tokrepo install sentry-open-source-error-tracking-performance-monitoring-ece57add

MCP#09

SigNoz MCP Server — Query Traces, Logs & Alerts

SigNoz MCP Server connects MCP clients to your SigNoz instance: query traces/logs, inspect alerts, and automate observability workflows using an API key.

by MCP Hub·246 views

$ tokrepo install signoz-mcp-server-query-traces-logs-alerts

MCP#10

ClickHouse MCP — Read-Only Defaults + Drop Protection

ClickHouse MCP connects MCP clients to ClickHouse or embedded chDB with read-only defaults, optional writes, and double opt-in for DROP/TRUNCATE safety.

by MCP Hub·128 views

$ tokrepo install clickhouse-mcp-read-only-defaults-drop-protection

Questions fréquentes

Do I really need all three of Loki, Elasticsearch, and ClickHouse?

No — pick one. The pack lists all three because the right answer depends on your shape. Loki is the default for ~80% of teams: cheap, label-indexed, easy to run. Pick Elasticsearch if your daily question is 'find this string anywhere in any message body' (it's much better at unstructured full-text). Pick ClickHouse when you cross ~100 GB/day or need real SQL analytics on logs. Running all three is fine for a comparison week, painful as a permanent state.

Where does AI fit in this stack — is the SigNoz MCP just a chat UI?

It's more than a chat UI. The MCP server exposes traces, logs, and alerts as tools an AI agent can call autonomously. Practical examples: a Claude agent triages a Sentry alert by querying SigNoz for the trace, pulls the corresponding logs from Loki, and writes a one-paragraph incident summary into your ticket — all from one prompt. The ClickHouse MCP plays the same role for SQL-style log analytics, with read-only enforced so the agent can't drop a table.

Why winston/Loguru instead of just printing JSON manually?

Three reasons. First: structured fields are added by API, not string concatenation, so they're consistent across the codebase. Second: log levels, sampling, and transports (file / stdout / network) are decoupled from call sites. Third: ecosystems — winston has 100+ transports, Loguru integrates with FastAPI/Django out of the box. You could roll your own with json.dumps, but you'll re-invent these features within a month.

Is Sentry redundant if I already ship error logs to Loki?

No, the jobs differ. Loki/ES/CH stores everything indiscriminately and answers 'show me the sequence around this request.' Sentry deduplicates exceptions by stack trace, groups them as 'issues,' tracks first-seen / regression / volume spike, and pages you when a new issue appears. Treat Sentry as your error inbox and your log store as the witness — both serve you, neither replaces the other.

Can this whole pack run on one VM, or do I need a Kubernetes cluster?

One mid-size VM (16 vCPU, 32 GB RAM, 500 GB SSD) handles up to ~20 GB/day of log volume with Loki + Fluent Bit + Sentry self-hosted comfortably. Past 50 GB/day, split Loki object storage off to S3-compatible storage and give ClickHouse/Elasticsearch their own nodes. You don't need Kubernetes for this — docker-compose is fine and arguably preferable below 50 GB/day. Add k8s when you have ops appetite to maintain it, not because the log pipeline requires it.

PLUS DANS L'ARSENAL

12 packs · 80+ ressources sélectionnées

Découvrez tous les packs curatés sur la page d'accueil

Retour à tous les packs