Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsMay 14, 2026·3 min de lecture

Puma — Fast Concurrent Web Server for Ruby and Rack

Puma is a high-performance, multi-threaded HTTP server for Ruby web applications. It serves Rack-compatible frameworks like Rails, Sinatra, and Hanami with low memory usage and high concurrency, making it the default web server for Ruby on Rails in production.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Puma Overview
Commande CLI universelle
npx tokrepo install b97fd463-4f6e-11f1-9bc6-00163e2b0d79

Introduction

Puma is a Ruby HTTP server built for speed and concurrency. It uses a multi-threaded architecture to handle many requests simultaneously within a single process, and supports a clustered mode with multiple worker processes for leveraging multi-core CPUs. Puma is the default server for Ruby on Rails and works with any Rack-compatible framework.

What Puma Does

  • Serves HTTP/1.1 requests for any Rack-compatible Ruby application
  • Handles concurrent requests using threads within each worker process
  • Supports a clustered mode with forked worker processes for multi-core utilization
  • Provides zero-downtime restarts via phased restart and hot restart mechanisms
  • Binds to TCP ports or Unix sockets with optional SSL/TLS termination

Architecture Overview

Puma uses a reactor pattern for accepting connections and dispatches them to a thread pool for processing. In single mode, one process runs a configurable number of threads. In cluster mode, a master process forks multiple workers, each with its own thread pool. Workers are monitored and automatically restarted if they crash. Puma includes a built-in state file and control app for remote management. Request parsing is handled by a native C extension for performance.

Self-Hosting & Configuration

  • Add Puma to your Gemfile and create a config/puma.rb configuration file
  • Set workers count to match CPU cores for cluster mode, or use WEB_CONCURRENCY env var
  • Configure threads with a min and max range (e.g., threads 5, 5)
  • Bind to a Unix socket for reverse proxy setups or a TCP port for direct serving
  • Use preload_app! to share memory between workers via copy-on-write forking

Key Features

  • Thread-safe concurrent request handling with configurable thread pool sizing
  • Cluster mode with automatic worker management and memory-efficient forking
  • Zero-downtime deploys through phased restart (rolling worker replacement)
  • Built-in control server for runtime stats, restarts, and thread backtraces
  • Default server in Ruby on Rails with tight framework integration

Comparison with Similar Tools

  • Unicorn — multi-process, single-threaded Ruby server; simpler model but uses more memory and cannot handle slow clients efficiently
  • Passenger — application server supporting Ruby, Python, and Node.js; more features (like built-in load balancing) but commercial for advanced options
  • Falcon — async Ruby server using fibers; better for I/O-heavy workloads but requires fiber-compatible code
  • Thin — EventMachine-based Ruby server; lighter but single-threaded and less actively maintained
  • Pitchfork — Unicorn fork by Shopify with improved memory management; copy-on-write optimized but still single-threaded per worker

FAQ

Q: How many threads and workers should I configure? A: A common starting point is one worker per CPU core and 5 threads per worker. For MRI Ruby, threads help with I/O-bound work since the GVL limits CPU parallelism. Adjust based on your application's memory usage and latency profile.

Q: Does Puma support HTTP/2? A: Puma serves HTTP/1.1 natively. For HTTP/2 support, place a reverse proxy like Nginx or Caddy in front of Puma to handle HTTP/2 termination.

Q: How does zero-downtime restart work? A: In cluster mode, a phased restart replaces workers one at a time. Each old worker finishes its current requests before shutting down, while a new worker boots with updated code. This keeps the application available throughout the deploy.

Q: Can Puma handle WebSocket connections? A: Puma supports Rack hijacking, which libraries like ActionCable use for WebSocket connections in Rails. However, dedicated WebSocket servers like AnyCable may handle higher connection counts more efficiently.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires