Introduction
Puma is a Ruby HTTP server built for speed and concurrency. It uses a multi-threaded architecture to handle many requests simultaneously within a single process, and supports a clustered mode with multiple worker processes for leveraging multi-core CPUs. Puma is the default server for Ruby on Rails and works with any Rack-compatible framework.
What Puma Does
- Serves HTTP/1.1 requests for any Rack-compatible Ruby application
- Handles concurrent requests using threads within each worker process
- Supports a clustered mode with forked worker processes for multi-core utilization
- Provides zero-downtime restarts via phased restart and hot restart mechanisms
- Binds to TCP ports or Unix sockets with optional SSL/TLS termination
Architecture Overview
Puma uses a reactor pattern for accepting connections and dispatches them to a thread pool for processing. In single mode, one process runs a configurable number of threads. In cluster mode, a master process forks multiple workers, each with its own thread pool. Workers are monitored and automatically restarted if they crash. Puma includes a built-in state file and control app for remote management. Request parsing is handled by a native C extension for performance.
Self-Hosting & Configuration
- Add Puma to your Gemfile and create a
config/puma.rbconfiguration file - Set
workerscount to match CPU cores for cluster mode, or useWEB_CONCURRENCYenv var - Configure
threadswith a min and max range (e.g.,threads 5, 5) - Bind to a Unix socket for reverse proxy setups or a TCP port for direct serving
- Use
preload_app!to share memory between workers via copy-on-write forking
Key Features
- Thread-safe concurrent request handling with configurable thread pool sizing
- Cluster mode with automatic worker management and memory-efficient forking
- Zero-downtime deploys through phased restart (rolling worker replacement)
- Built-in control server for runtime stats, restarts, and thread backtraces
- Default server in Ruby on Rails with tight framework integration
Comparison with Similar Tools
- Unicorn — multi-process, single-threaded Ruby server; simpler model but uses more memory and cannot handle slow clients efficiently
- Passenger — application server supporting Ruby, Python, and Node.js; more features (like built-in load balancing) but commercial for advanced options
- Falcon — async Ruby server using fibers; better for I/O-heavy workloads but requires fiber-compatible code
- Thin — EventMachine-based Ruby server; lighter but single-threaded and less actively maintained
- Pitchfork — Unicorn fork by Shopify with improved memory management; copy-on-write optimized but still single-threaded per worker
FAQ
Q: How many threads and workers should I configure? A: A common starting point is one worker per CPU core and 5 threads per worker. For MRI Ruby, threads help with I/O-bound work since the GVL limits CPU parallelism. Adjust based on your application's memory usage and latency profile.
Q: Does Puma support HTTP/2? A: Puma serves HTTP/1.1 natively. For HTTP/2 support, place a reverse proxy like Nginx or Caddy in front of Puma to handle HTTP/2 termination.
Q: How does zero-downtime restart work? A: In cluster mode, a phased restart replaces workers one at a time. Each old worker finishes its current requests before shutting down, while a new worker boots with updated code. This keeps the application available throughout the deploy.
Q: Can Puma handle WebSocket connections? A: Puma supports Rack hijacking, which libraries like ActionCable use for WebSocket connections in Rails. However, dedicated WebSocket servers like AnyCable may handle higher connection counts more efficiently.