What is Petals — Run LLMs at Home BitTorrent-Style?

A decentralized system for running large language models collaboratively across consumer hardware. Distributes model layers across peers for inference and fine-tuning.

Is Petals — Run LLMs at Home BitTorrent-Style free to use?

Yes. Petals — Run LLMs at Home BitTorrent-Style is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Petals — Run LLMs at Home BitTorrent-Style?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Petals — Run LLMs at Home BitTorrent-Style

Introduction

Petals enables running 100B+ parameter language models by splitting them across multiple consumer-grade machines connected over the internet. Inspired by BitTorrent, each participant hosts a subset of model layers while the system routes inference through available peers, making large-scale models accessible without enterprise hardware.

What Petals Does

Distributes large language model layers across multiple peers on the internet
Enables inference on 100B+ parameter models using commodity GPUs
Supports fine-tuning via parameter-efficient methods like adapters and prompt tuning
Provides a Hugging Face-compatible API for drop-in integration
Runs both public swarms and private clusters for controlled deployments

Architecture Overview

Petals partitions a model's Transformer layers across a network of servers. When a client sends a request, the system routes hidden states sequentially through peers hosting consecutive layer ranges. A DHT-based routing protocol discovers available servers and balances load. Each peer only needs enough GPU memory for its assigned layers, so a 176B parameter model can run across a handful of consumer GPUs.

Self-Hosting & Configuration

Install via pip: pip install petals on Python 3.8+
Run a server with python -m petals.cli.run_server bigscience/bloom --num_blocks 12
Each server hosts a configurable number of Transformer blocks based on available VRAM
Join the public swarm automatically or configure a private swarm with --initial_peers
Monitor server health and swarm status via the Petals health dashboard

Key Features

Run 100B+ models on hardware that could never fit them locally
Up to 10x faster than offloading-based approaches for distributed inference
Fine-tune with LoRA or prompt tuning across the distributed network
Fault-tolerant routing automatically reroutes around offline peers
Compatible with Hugging Face generate API and chat templates

Comparison with Similar Tools

llama.cpp — optimized single-machine inference; Petals distributes across many machines for models that exceed local capacity
vLLM — high-throughput serving on a single node or cluster; Petals targets volunteer-style distributed setups
Ollama — simplified local LLM experience; Petals handles models too large for any single machine
ExLlamaV2 — quantized inference for fitting models on one GPU; Petals runs full-precision across many GPUs
Together AI — managed distributed inference; Petals is self-hosted and free

FAQ

Q: How fast is inference compared to running the full model locally? A: Latency depends on network speed between peers. On a well-connected swarm, generation is interactive (a few tokens per second for large models), though slower than dedicated hardware.

Q: What models are supported? A: Petals supports most Hugging Face Transformer models. The public swarm typically hosts BLOOM and Llama variants. Private swarms can host any model.

Q: Is my data private when using the public swarm? A: Intermediate activations pass through other participants' machines. For sensitive data, run a private swarm with trusted peers.

Q: Can I contribute GPU time without running inference myself? A: Yes. Run the server command to donate your GPU to the public swarm. You help others run models while earning no direct cost.

Petals — Run LLMs at Home BitTorrent-Style

Introduction

What Petals Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

Mission Control — Self-Hosted AI Agent Orchestration Platform

Codeburn — AI Coding Cost Observability Dashboard

Paseo — Orchestrate Coding Agents from Your Phone