Scripts2026年4月23日·1 分钟阅读

Logstash — Server-Side Data Processing Pipeline

Logstash is a data collection and processing engine that ingests logs, metrics, and events from diverse sources, transforms them through configurable filter plugins, and routes them to Elasticsearch or other destinations.

assetLangBanner.body

Introduction

Logstash is the data processing backbone of the Elastic Stack. It ingests data from hundreds of sources simultaneously, parses and enriches each event in real time, and routes the result to one or more outputs. It bridges the gap between raw data and actionable insights in Elasticsearch.

What Logstash Does

  • Ingests data from files, syslog, Kafka, Beats, HTTP, JDBC, and 50+ input plugins
  • Parses unstructured logs into structured fields using grok, dissect, and JSON filters
  • Enriches events with GeoIP lookups, DNS resolution, and external database joins
  • Routes events conditionally to different outputs based on field values or tags
  • Handles backpressure with persistent queues to prevent data loss

Architecture Overview

Logstash runs as a JVM-based process. A pipeline consists of three stages: inputs receive events, filters transform them, and outputs ship them. Events flow through an internal queue (in-memory or disk-backed persistent queue). Multiple pipelines can run in a single Logstash instance with isolated configurations. The pipeline compiler optimizes filter execution order.

Self-Hosting & Configuration

  • Pipeline configs go in /etc/logstash/conf.d/ with .conf extension
  • Settings in logstash.yml control workers, batch size, and queue type
  • Enable persistent queues (queue.type: persisted) for durability across restarts
  • Use pipelines.yml to run multiple pipelines with separate configs and workers
  • Monitor via the Logstash Monitoring API at localhost:9600/_node/stats

Key Features

  • Grok: pattern-based parser with 120+ built-in patterns for common log formats
  • Dead letter queue: captures events that fail processing for later inspection
  • Pipeline-to-pipeline communication for complex routing topologies
  • Centralized pipeline management via Kibana when using Elastic Stack
  • Codec plugins (multiline, json_lines, avro) handle wire-format decoding at input

Comparison with Similar Tools

  • Fluent Bit — C-based, lower resource usage; Logstash offers richer transformation logic
  • Fluentd — Ruby-based, tag-routing model; Logstash has deeper Elastic Stack integration
  • Vector — Rust-based, faster throughput; Logstash has a larger filter plugin library
  • Apache NiFi — visual dataflow; Logstash is config-file driven and lighter to deploy

FAQ

Q: How much memory does Logstash need? A: The JVM defaults to 1 GB heap. Production deployments typically use 2-4 GB depending on pipeline complexity and throughput.

Q: Can Logstash output to something other than Elasticsearch? A: Yes. Outputs include Kafka, S3, Redis, stdout, HTTP, and many more. Multiple outputs per pipeline are supported.

Q: Is Logstash required for the Elastic Stack? A: No. Elastic Agent and Beats can ship data directly to Elasticsearch. Logstash is used when you need complex transformations or non-Elastic outputs.

Q: How do I parse custom log formats? A: Write a grok pattern or use the dissect filter for delimiter-based parsing. Test patterns at grokdebugger in Kibana Dev Tools.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产