# Puma — Fast Concurrent Web Server for Ruby and Rack > Puma is a high-performance, multi-threaded HTTP server for Ruby web applications. It serves Rack-compatible frameworks like Rails, Sinatra, and Hanami with low memory usage and high concurrency, making it the default web server for Ruby on Rails in production. ## Install Save in your project root: # Puma — Fast Concurrent Web Server for Ruby and Rack ## Quick Use ```bash gem install puma # Add to your Gemfile: # gem "puma" # Start with: puma -C config/puma.rb # Or in Rails: rails server (Puma is the default) ``` ## Introduction Puma is a Ruby HTTP server built for speed and concurrency. It uses a multi-threaded architecture to handle many requests simultaneously within a single process, and supports a clustered mode with multiple worker processes for leveraging multi-core CPUs. Puma is the default server for Ruby on Rails and works with any Rack-compatible framework. ## What Puma Does - Serves HTTP/1.1 requests for any Rack-compatible Ruby application - Handles concurrent requests using threads within each worker process - Supports a clustered mode with forked worker processes for multi-core utilization - Provides zero-downtime restarts via phased restart and hot restart mechanisms - Binds to TCP ports or Unix sockets with optional SSL/TLS termination ## Architecture Overview Puma uses a reactor pattern for accepting connections and dispatches them to a thread pool for processing. In single mode, one process runs a configurable number of threads. In cluster mode, a master process forks multiple workers, each with its own thread pool. Workers are monitored and automatically restarted if they crash. Puma includes a built-in state file and control app for remote management. Request parsing is handled by a native C extension for performance. ## Self-Hosting & Configuration - Add Puma to your Gemfile and create a `config/puma.rb` configuration file - Set `workers` count to match CPU cores for cluster mode, or use `WEB_CONCURRENCY` env var - Configure `threads` with a min and max range (e.g., `threads 5, 5`) - Bind to a Unix socket for reverse proxy setups or a TCP port for direct serving - Use `preload_app!` to share memory between workers via copy-on-write forking ## Key Features - Thread-safe concurrent request handling with configurable thread pool sizing - Cluster mode with automatic worker management and memory-efficient forking - Zero-downtime deploys through phased restart (rolling worker replacement) - Built-in control server for runtime stats, restarts, and thread backtraces - Default server in Ruby on Rails with tight framework integration ## Comparison with Similar Tools - **Unicorn** — multi-process, single-threaded Ruby server; simpler model but uses more memory and cannot handle slow clients efficiently - **Passenger** — application server supporting Ruby, Python, and Node.js; more features (like built-in load balancing) but commercial for advanced options - **Falcon** — async Ruby server using fibers; better for I/O-heavy workloads but requires fiber-compatible code - **Thin** — EventMachine-based Ruby server; lighter but single-threaded and less actively maintained - **Pitchfork** — Unicorn fork by Shopify with improved memory management; copy-on-write optimized but still single-threaded per worker ## FAQ **Q: How many threads and workers should I configure?** A: A common starting point is one worker per CPU core and 5 threads per worker. For MRI Ruby, threads help with I/O-bound work since the GVL limits CPU parallelism. Adjust based on your application's memory usage and latency profile. **Q: Does Puma support HTTP/2?** A: Puma serves HTTP/1.1 natively. For HTTP/2 support, place a reverse proxy like Nginx or Caddy in front of Puma to handle HTTP/2 termination. **Q: How does zero-downtime restart work?** A: In cluster mode, a phased restart replaces workers one at a time. Each old worker finishes its current requests before shutting down, while a new worker boots with updated code. This keeps the application available throughout the deploy. **Q: Can Puma handle WebSocket connections?** A: Puma supports Rack hijacking, which libraries like ActionCable use for WebSocket connections in Rails. However, dedicated WebSocket servers like AnyCable may handle higher connection counts more efficiently. ## Sources - https://github.com/puma/puma - https://puma.io/ --- Source: https://tokrepo.com/en/workflows/asset-b97fd463 Author: AI Open Source