CLI ToolsMay 14, 2026·2 min read

vllm-cli — vLLM Model Serving CLI (Python)

vllm-cli is a CLI for serving models with vLLM; verified 493★ with Python 3.9+ and docs for profiles, shortcuts, and `serve --model` workflows.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Native · 94/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Cli
Install
Bundle
Trust
Trust: Established
Entrypoint
pip install vllm-cli
Universal CLI install command
npx tokrepo install 40ec8ddf-a76c-5fa0-9d20-f54ab035128d
Intro

vllm-cli is a CLI for serving models with vLLM; verified 493★ with Python 3.9+ and docs for profiles, shortcuts, and serve --model workflows.

Best for: Builders who want a menu-driven TUI plus scriptable commands for managing vLLM model servers

Works with: Python 3.9+, vLLM installed separately (README notes CUDA/PyTorch compatibility), optional uv/conda workflows

Setup time: 15-30 minutes

Key facts (verified)

  • GitHub: 493 stars · 28 forks · pushed 2026-01-25.
  • License: MIT · owner avatar + repo URL verified via GitHub API.
  • README-backed entrypoint: pip install vllm-cli.

Main

  • Start in interactive mode (vllm-cli) when setting up GPUs/profiles, then switch to command-line mode for repeatable automation runs.

  • Use built-in profiles and shortcuts to codify serving parameters; README shows serve --shortcut and hardware-optimized GPT-OSS profiles.

  • Treat vLLM install as a separate compatibility step: README warns CUDA kernels must match PyTorch versions and vLLM-CLI won’t install vLLM by default.

Source-backed notes

  • README documents Python 3.9+ support and multiple install options including pip install vllm-cli and pip install vllm-cli[vllm].
  • README includes a basic usage snippet: vllm-cli serve --model openai/gpt-oss-20b.
  • README notes vLLM binary compatibility concerns and recommends uv/conda-style installs for PyTorch/CUDA alignment.

FAQ

  • Does vllm-cli install vLLM for me?: Not by default — README says vLLM-CLI will not install vLLM or PyTorch unless you use the extra.
  • What is the first serving command to try?: README shows vllm-cli serve --model openai/gpt-oss-20b as a basic example.
  • Why does install matter?: README warns vLLM uses pre-compiled CUDA kernels that must match your PyTorch version.
🙏

Source & Thanks

Source: https://github.com/Chen-zexi/vllm-cli > License: MIT > GitHub stars: 493 · forks: 28

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets