ScriptsApr 20, 2026·3 min read

Hatchet — Durable Workflow Engine for Distributed Systems

Hatchet is an open-source durable workflow engine that replaces difficult-to-manage task queues with fault-tolerant, observable DAG-based workflows for background jobs and distributed task execution.

Introduction

Hatchet is a distributed workflow engine designed to replace brittle task queues like Celery, BullMQ, or SQS-based setups. It provides durable execution guarantees, automatic retries, DAG-based step orchestration, and a visual dashboard for monitoring — making it straightforward to build reliable background processing pipelines without custom retry logic or dead letter queue management.

What Hatchet Does

  • Orchestrates multi-step workflows as directed acyclic graphs with automatic dependency resolution
  • Provides durable execution with at-least-once delivery and configurable retry policies
  • Offers real-time workflow monitoring through a built-in web dashboard
  • Supports event-driven triggers, cron schedules, and programmatic workflow dispatch
  • Enables concurrency control with rate limiting and queue-level concurrency limits

Architecture Overview

Hatchet consists of a Go-based engine server backed by PostgreSQL for state persistence and a message queue (NATS or RabbitMQ) for task dispatch. Workers connect to the engine via gRPC and pull tasks from assigned queues. The engine manages workflow state transitions, retry scheduling, and timeout enforcement. SDKs in Python, TypeScript, and Go handle worker registration, step execution, and result reporting. The web UI reads from PostgreSQL to display workflow runs, step statuses, and logs.

Self-Hosting & Configuration

  • Deploy via Docker Compose for development or Helm chart for Kubernetes production use
  • Requires PostgreSQL 14+ for workflow state and a message broker (NATS recommended)
  • Configure workers with environment variables: HATCHET_CLIENT_TOKEN and HATCHET_CLIENT_TLS_ROOT_CA_FILE
  • Set retry policies, timeouts, and concurrency limits per workflow or per step
  • Monitor workflows through the built-in dashboard on port 8080

Key Features

  • DAG-based workflow definition with conditional branching and fan-out/fan-in patterns
  • Built-in rate limiting and concurrency control at workflow and step levels
  • Event-driven triggers with support for webhook, cron, and manual dispatch
  • Multi-tenant architecture with isolated queues and RBAC
  • SDKs for Python, TypeScript, and Go with decorator-based workflow definitions

Comparison with Similar Tools

  • Temporal — more mature durable workflow platform but heavier to operate and learn
  • Inngest — similar event-driven approach with a managed cloud offering, less self-host flexibility
  • Celery — established Python task queue lacking workflow orchestration and visual monitoring
  • BullMQ — Redis-backed Node.js queue without built-in DAG support or durability guarantees
  • Prefect — Python-focused data pipeline orchestrator with different deployment model

FAQ

Q: How does Hatchet differ from a task queue like Celery? A: Hatchet provides workflow-level orchestration (DAGs, retries, timeouts, concurrency) with a visual dashboard, whereas Celery focuses on individual task dispatch without built-in workflow primitives.

Q: Can Hatchet replace Temporal? A: For many use cases, yes. Hatchet offers a simpler API and faster setup, though Temporal has a more mature ecosystem for complex saga patterns and multi-language replay.

Q: What happens if a worker crashes mid-step? A: The engine detects the timeout, marks the step as failed, and reschedules it according to the configured retry policy on an available worker.

Q: Is there a managed cloud version? A: Yes. Hatchet offers a managed cloud service, but the open-source version is fully self-hostable with no feature gating.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets