Introduction
Apache SkyWalking is a top-level Apache Foundation observability platform focused on distributed tracing, service-mesh telemetry, and application performance monitoring for cloud-native stacks. It was designed from day one around the complexity of microservices and service meshes, so it bundles tracing, metrics, logs, events, and alerting into a single backend instead of asking you to glue them together.
What SkyWalking Does
- Collects distributed traces from Java, .NET, Node.js, Python, Go, PHP, Rust, and LUA agents.
- Ingests OpenTelemetry, Zipkin, Jaeger, Prometheus, and eBPF-based profiling data out of the box.
- Builds topology maps of services, instances, endpoints, and external dependencies automatically.
- Correlates logs and traces via a shared trace/segment ID and searchable tag query language.
- Provides alerting, metric analysis language (MAL/OAL), and dashboards in a single OAP backend.
Architecture Overview
The core backend is called OAP (Observability Analysis Platform), written in Java. Agents and collectors push data to OAP via gRPC or HTTP; OAP parses it through stream analysis pipelines, produces metrics from traces using the Observability Analysis Language, and persists everything to a pluggable storage layer (Elasticsearch, OpenSearch, BanyanDB, MySQL/PostgreSQL, TiDB). The Rocketbot UI and a GraphQL API sit on top. For service meshes, SkyWalking includes Envoy ALS receivers and Rover, an eBPF agent that profiles processes on Kubernetes nodes without any code changes.
Self-Hosting & Configuration
- Helm chart:
helm install skywalking oci://registry-1.docker.io/apache/skywalking-helmwith value overrides for storage. - For production, run BanyanDB or Elasticsearch 8.x — avoid H2 beyond evaluation.
- Scale OAP horizontally behind a headless service; SkyWalking uses Zookeeper or Kubernetes for cluster coordination.
- Tune
core.recordDataTTLandcore.metricsDataTTLto bound storage growth. - Enable the alarm engine via
alarm-settings.ymland plug in webhooks, Slack, DingTalk, or PagerDuty.
Key Features
- Native support for both agent-based instrumentation and service-mesh telemetry.
- eBPF profiling with Rover for CPU/off-CPU and network profiling without recompilation.
- Log/trace correlation and trace-to-metrics conversion via the OAL scripting language.
- Browser Real User Monitoring (RUM) agent for frontend performance data.
- BanyanDB: a purpose-built observability database written in Go, shipped by the same project.
Comparison with Similar Tools
- Jaeger — strong distributed tracing but lacks integrated metrics and logs pipeline.
- Prometheus + Grafana + Loki + Tempo — powerful stack, but you assemble and operate four systems.
- Elastic APM — tight Elasticsearch coupling; SkyWalking is storage-agnostic.
- Datadog APM — SaaS, per-host pricing; SkyWalking is self-hostable and Apache-licensed.
- SigNoz — similar all-in-one goal, smaller scale; SkyWalking has broader agent coverage.
FAQ
Q: Can SkyWalking ingest OpenTelemetry data? A: Yes. The OTel collector can ship OTLP traces, metrics, and logs directly to OAP.
Q: Does the Java agent require code changes?
A: No. Attach it via -javaagent and it auto-instruments common frameworks.
Q: What storage should I choose for production? A: BanyanDB for observability-native workloads, or Elasticsearch/OpenSearch if you already run one.
Q: Is it viable for non-Java stacks? A: Absolutely. Node.js, Python, Go (via SkyAPM-go2sky), .NET Core, and PHP agents are first-class.