# Logstash — Server-Side Data Processing Pipeline

> Logstash is a data collection and processing engine that ingests logs, metrics, and events from diverse sources, transforms them through configurable filter plugins, and routes them to Elasticsearch or other destinations.

## Install

Save as a script file and run:

# Logstash — Server-Side Data Processing Pipeline

## Quick Use
```bash
# Install on Debian/Ubuntu
sudo apt-get install logstash

# Run a simple stdin-to-stdout pipeline
echo 'hello world' | logstash -e "input { stdin {} } output { stdout {} }"

# Run with a pipeline config file
logstash -f /etc/logstash/conf.d/my-pipeline.conf

# Start as a service
sudo systemctl start logstash
```

## Introduction
Logstash is the data processing backbone of the Elastic Stack. It ingests data from hundreds of sources simultaneously, parses and enriches each event in real time, and routes the result to one or more outputs. It bridges the gap between raw data and actionable insights in Elasticsearch.

## What Logstash Does
- Ingests data from files, syslog, Kafka, Beats, HTTP, JDBC, and 50+ input plugins
- Parses unstructured logs into structured fields using grok, dissect, and JSON filters
- Enriches events with GeoIP lookups, DNS resolution, and external database joins
- Routes events conditionally to different outputs based on field values or tags
- Handles backpressure with persistent queues to prevent data loss

## Architecture Overview
Logstash runs as a JVM-based process. A pipeline consists of three stages: inputs receive events, filters transform them, and outputs ship them. Events flow through an internal queue (in-memory or disk-backed persistent queue). Multiple pipelines can run in a single Logstash instance with isolated configurations. The pipeline compiler optimizes filter execution order.

## Self-Hosting & Configuration
- Pipeline configs go in `/etc/logstash/conf.d/` with `.conf` extension
- Settings in `logstash.yml` control workers, batch size, and queue type
- Enable persistent queues (`queue.type: persisted`) for durability across restarts
- Use `pipelines.yml` to run multiple pipelines with separate configs and workers
- Monitor via the Logstash Monitoring API at `localhost:9600/_node/stats`

## Key Features
- Grok: pattern-based parser with 120+ built-in patterns for common log formats
- Dead letter queue: captures events that fail processing for later inspection
- Pipeline-to-pipeline communication for complex routing topologies
- Centralized pipeline management via Kibana when using Elastic Stack
- Codec plugins (multiline, json_lines, avro) handle wire-format decoding at input

## Comparison with Similar Tools
- **Fluent Bit** — C-based, lower resource usage; Logstash offers richer transformation logic
- **Fluentd** — Ruby-based, tag-routing model; Logstash has deeper Elastic Stack integration
- **Vector** — Rust-based, faster throughput; Logstash has a larger filter plugin library
- **Apache NiFi** — visual dataflow; Logstash is config-file driven and lighter to deploy

## FAQ
**Q: How much memory does Logstash need?**
A: The JVM defaults to 1 GB heap. Production deployments typically use 2-4 GB depending on pipeline complexity and throughput.

**Q: Can Logstash output to something other than Elasticsearch?**
A: Yes. Outputs include Kafka, S3, Redis, stdout, HTTP, and many more. Multiple outputs per pipeline are supported.

**Q: Is Logstash required for the Elastic Stack?**
A: No. Elastic Agent and Beats can ship data directly to Elasticsearch. Logstash is used when you need complex transformations or non-Elastic outputs.

**Q: How do I parse custom log formats?**
A: Write a grok pattern or use the dissect filter for delimiter-based parsing. Test patterns at grokdebugger in Kibana Dev Tools.

## Sources
- https://github.com/elastic/logstash
- https://www.elastic.co/logstash

---
Source: https://tokrepo.com/en/workflows/d15c319f-3f30-11f1-9bc6-00163e2b0d79
Author: Script Depot