nano-vllm is a minimal, educational, and performant LLM inference engine that reimplements core vLLM concepts in clean Python for easy understanding and extension.
nano-vllm — Lightweight LLM Serving Engine
nano-vllm is a minimal, educational, and performant LLM inference engine that reimplements core vLLM concepts in clean Python for easy understanding and extension.
This asset can be read and installed directly by agents
TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.
npx tokrepo install 27f1bbc3-470d-11f1-9bc6-00163e2b0d79Discussion
Related Assets
vLLM — High-Throughput LLM Serving Engine
vLLM is a high-throughput and memory-efficient LLM inference engine. 74.8K+ GitHub stars. PagedAttention, continuous batching, OpenAI-compatible API, multi-GPU serving. Apache 2.0.
nano-graphrag — Lightweight GraphRAG Implementation
A simple, hackable implementation of Microsoft GraphRAG that builds knowledge graphs from documents and uses graph-based retrieval for more accurate LLM question answering.
Anime.js — Lightweight JavaScript Animation Engine
A lightweight JavaScript animation library with a simple yet powerful API for CSS properties, SVG, DOM attributes, and JavaScript objects.
Rathole — Lightweight High-Performance Reverse Proxy for NAT Traversal in Rust
A fast and resource-efficient reverse proxy written in Rust for exposing local services behind NATs and firewalls, serving as a lightweight alternative to frp and ngrok.