# Netdata — Real-Time Infrastructure Monitoring & Observability

> Netdata is an open-source monitoring agent that collects thousands of metrics per second with zero configuration. Beautiful dashboards, ML-powered alerts, and instant deployment.

## Install

Save as a script file and run:

## Quick Use

```bash
# One-line install on any Linux
curl https://get.netdata.cloud/kickstart.sh > /tmp/netdata-kickstart.sh && sh /tmp/netdata-kickstart.sh

# Or Docker
docker run -d --name netdata -p 19999:19999 
  -v netdata-config:/etc/netdata 
  -v netdata-lib:/var/lib/netdata 
  -v netdata-cache:/var/cache/netdata 
  -v /:/host/root:ro -v /etc/passwd:/host/etc/passwd:ro 
  -v /etc/group:/host/etc/group:ro -v /etc/localtime:/host/etc/localtime:ro 
  -v /proc:/host/proc:ro -v /sys:/host/sys:ro 
  --cap-add SYS_PTRACE --security-opt apparmor=unconfined 
  netdata/netdata
```

Open `http://localhost:19999` — see real-time metrics immediately, no configuration needed.

## Intro

**Netdata** is an open-source, real-time infrastructure monitoring and observability platform. It auto-discovers and collects thousands of metrics per second from systems, containers, databases, and applications with zero configuration — presenting everything in beautiful, interactive dashboards that update every second.

With 78.4K+ GitHub stars and GPL-3.0 license, Netdata is the most starred monitoring project on GitHub, valued for its instant deployment, zero-config auto-discovery, and per-second granularity that competitors can't match.

## What Netdata Does

- **Auto-Discovery**: Automatically detects and monitors OS, containers, databases, web servers, and 800+ integrations
- **Per-Second Metrics**: Collects metrics every second (not every 15s like Prometheus) for real-time visibility
- **Zero Config**: Install and immediately see 2,000+ metrics — no YAML files, no exporters to deploy
- **ML-Powered Alerts**: Machine learning detects anomalies in every metric automatically
- **Beautiful Dashboards**: Interactive, drill-down dashboards that update in real-time
- **Distributed Architecture**: Deploy agents everywhere, view all data in one place via Netdata Cloud
- **Low Overhead**: ~1% CPU, ~100MB RAM for monitoring an entire server with thousands of metrics
- **Long-Term Storage**: Built-in tiered storage with configurable retention

## Architecture

```
┌─────────────────────────────────────────────┐
│  Netdata Agent (on each server)             │
│  ┌───────────┐ ┌──────────┐ ┌────────────┐ │
│  │Collectors │ │ ML Engine│ │ Dashboard  │ │
│  │(800+ auto)│ │(Anomaly) │ │ (Built-in) │ │
│  └───────────┘ └──────────┘ └────────────┘ │
│  ┌───────────┐ ┌──────────┐ ┌────────────┐ │
│  │ TSDB      │ │ Alerts   │ │ Streaming  │ │
│  │(Per-second)│ │(ML+Rules)│ │ (to Cloud) │ │
│  └───────────┘ └──────────┘ └────────────┘ │
└─────────────────────────────────────────────┘
```

## What Gets Monitored Automatically

```
System:
├── CPU (per core, per process, by type)
├── Memory (RAM, swap, page faults, NUMA)
├── Disk I/O (per device, latency, utilization)
├── Network (per interface, packets, errors)
├── Processes (count, states, context switches)
└── Sensors (temperature, fans, voltage)

Containers:
├── Docker (per container CPU, memory, I/O, network)
├── Kubernetes (pods, deployments, nodes)
└── LXC/LXD

Databases:
├── MySQL / MariaDB (queries, connections, replication)
├── PostgreSQL (locks, transactions, WAL)
├── Redis (commands, memory, keys)
├── MongoDB (operations, connections, replication)
└── Elasticsearch (indexing, search, cluster health)

Web Servers:
├── Nginx (requests, connections, status)
├── Apache (workers, requests, bandwidth)
├── HAProxy (frontend/backend, sessions)
└── Traefik (entrypoints, routers)

Applications:
├── Node.js, Python, Go, Java (runtime metrics)
├── RabbitMQ, Kafka (queues, messages)
├── DNS servers (queries, cache)
└── 800+ more integrations
```

## Key Features

### ML-Powered Anomaly Detection

Every metric gets a machine learning model trained on its historical patterns:

```
Normal: CPU usage follows daily work pattern
Alert:  CPU anomaly detected — usage 3σ above predicted

Normal: Disk I/O steady at 50 MB/s
Alert:  Disk I/O anomaly — unusual spike to 500 MB/s at 3am
```

No manual threshold configuration needed — ML learns what's normal for YOUR infrastructure.

### Composite Charts

Drill down from high-level overview to individual metrics:
```
Server Overview → CPU → Per Core → Per Process → System Calls
```

### Alert Notifications

```yaml
# Built-in notification channels:
- Email (SMTP)
- Slack
- Discord
- PagerDuty
- Opsgenie
- Telegram
- Microsoft Teams
- Custom webhook
```

### Streaming & Centralization

```
┌──────────┐     ┌──────────┐     ┌──────────┐
│ Agent 1  │────▶│          │     │ Netdata  │
│ (Web)    │     │  Parent  │────▶│ Cloud    │
│          │     │  Agent   │     │ (SaaS)   │
└──────────┘     │          │     └──────────┘
┌──────────┐     │          │
│ Agent 2  │────▶│          │
│ (DB)     │     └──────────┘
└──────────┘
```

Stream metrics from child agents to a parent for centralized dashboarding and long-term storage.

## Netdata vs Alternatives

| Feature | Netdata | Prometheus+Grafana | Datadog | Zabbix |
|---------|---------|-------------------|---------|--------|
| Setup time | 1 minute | Hours | Minutes | Hours |
| Configuration | Zero-config | Extensive YAML | Agent config | Templates |
| Granularity | Per-second | 15-second default | 15-second | 1-minute |
| ML alerts | Built-in | No (manual rules) | Yes | No |
| Out-of-box metrics | 2000+ | Need exporters | Agent-based | Templates |
| Resource usage | ~1% CPU, 100MB | Varies | ~1% CPU | Varies |
| Dashboard | Built-in real-time | Grafana (separate) | Built-in | Built-in |

## 常见问题

**Q: Netdata 和 Prometheus + Grafana 怎么选？**
A: Netdata 适合快速部署和实时监控，开箱即用。Prometheus + Grafana 适合需要长期指标存储、自定义查询（PromQL）和定制化仪表盘的场景。两者可以共存——Netdata 导出指标到 Prometheus 也是常见架构。

**Q: Netdata Cloud 是必须的吗？**
A: 不是。每个 Netdata agent 都有完整的本地仪表盘。Cloud 是可选的 SaaS 服务，用于跨多服务器的统一视图。自托管用户可以用 parent agent 替代。

**Q: 对服务器性能影响大吗？**
A: 非常小。典型场景下 CPU 占用 ~1%，内存 ~100-150MB。Netdata 使用高效的 C 语言编写，专门优化了低开销采集。

## 来源与致谢

- GitHub: [netdata/netdata](https://github.com/netdata/netdata) — 78.4K+ ⭐ | GPL-3.0
- 官网: [netdata.cloud](https://netdata.cloud)

---
Source: https://tokrepo.com/en/workflows/ca4a8158-34bf-11f1-9bc6-00163e2b0d79
Author: Script Depot