# vllm-cli — vLLM Model Serving CLI (Python) > vllm-cli is a CLI for serving models with vLLM; verified 493★ with Python 3.9+ and docs for profiles, shortcuts, and `serve --model` workflows. ## Install Copy the content below into your project: ## Quick Use ```bash # Recommended flow per README (install vLLM first, then vllm-cli): uv venv --python 3.12 --seed && source .venv/bin/activate uv pip install vllm --torch-backend=auto uv pip install --upgrade vllm-cli vllm-cli serve --model openai/gpt-oss-20b ``` ## Intro vllm-cli is a CLI for serving models with vLLM; verified 493★ with Python 3.9+ and docs for profiles, shortcuts, and `serve --model` workflows. **Best for:** Builders who want a menu-driven TUI plus scriptable commands for managing vLLM model servers **Works with:** Python 3.9+, vLLM installed separately (README notes CUDA/PyTorch compatibility), optional uv/conda workflows **Setup time:** 15-30 minutes ### Key facts (verified) - GitHub: 493 stars · 28 forks · pushed 2026-01-25. - License: MIT · owner avatar + repo URL verified via GitHub API. - README-backed entrypoint: `pip install vllm-cli`. ## Main - Start in interactive mode (`vllm-cli`) when setting up GPUs/profiles, then switch to command-line mode for repeatable automation runs. - Use built-in profiles and shortcuts to codify serving parameters; README shows `serve --shortcut` and hardware-optimized GPT-OSS profiles. - Treat vLLM install as a separate compatibility step: README warns CUDA kernels must match PyTorch versions and vLLM-CLI won’t install vLLM by default. ### Source-backed notes - README documents Python 3.9+ support and multiple install options including `pip install vllm-cli` and `pip install vllm-cli[vllm]`. - README includes a basic usage snippet: `vllm-cli serve --model openai/gpt-oss-20b`. - README notes vLLM binary compatibility concerns and recommends uv/conda-style installs for PyTorch/CUDA alignment. ### FAQ - **Does vllm-cli install vLLM for me?**: Not by default — README says vLLM-CLI will not install vLLM or PyTorch unless you use the extra. - **What is the first serving command to try?**: README shows `vllm-cli serve --model openai/gpt-oss-20b` as a basic example. - **Why does install matter?**: README warns vLLM uses pre-compiled CUDA kernels that must match your PyTorch version. ## Source & Thanks > Source: https://github.com/Chen-zexi/vllm-cli > License: MIT > GitHub stars: 493 · forks: 28 --- ## Quick Use ```bash # Recommended flow per README (install vLLM first, then vllm-cli): uv venv --python 3.12 --seed && source .venv/bin/activate uv pip install vllm --torch-backend=auto uv pip install --upgrade vllm-cli vllm-cli serve --model openai/gpt-oss-20b ``` ## Intro vllm-cli 是用 vLLM 启动模型服务的 CLI;已验证 493★,支持 Python 3.9+,并提供 profiles、shortcuts 以及 `serve --model` 的完整流程说明。 **Best for:** 既想要交互式 TUI,又需要可脚本化命令来管理 vLLM 模型服务的开发者 **Works with:** Python 3.9+;vLLM 建议单独安装(README 强调 CUDA/PyTorch 兼容性);可配合 uv/conda **Setup time:** 15-30 minutes ### Key facts (verified) - GitHub:493 stars · 28 forks;最近更新 2026-01-25。 - 许可证:MIT;作者头像与仓库链接均已通过 GitHub API 复核。 - README 中可对照的入口命令:`pip install vllm-cli`。 ## Main - 初期用交互模式(`vllm-cli`)配置 GPU 与 profiles,跑通后用命令行模式做可复现的自动化启动。 - 用 profiles + shortcuts 固化服务参数:README 提到 `serve --shortcut`,并提供面向 GPT-OSS 的硬件优化 profiles。 - 把 vLLM 安装当作独立的兼容性步骤:README 警告 CUDA kernel 必须匹配 PyTorch 版本,而且 vLLM-CLI 默认不安装 vLLM。 ### Source-backed notes - README 标注支持 Python 3.9+,并给出多种安装方式:`pip install vllm-cli`、`pip install vllm-cli[vllm]` 等。 - README 提供基础用法示例:`vllm-cli serve --model openai/gpt-oss-20b`。 - README 提醒 vLLM 的二进制兼容性问题,并推荐用 uv/conda 方式保证 PyTorch/CUDA 匹配。 ### FAQ - **vllm-cli 会默认帮我装 vLLM 吗?**:不会。README 说明默认不会安装 vLLM/PyTorch(除非使用带 extra 的安装方式)。 - **最先该试哪个服务命令?**:README 的基础示例是 `vllm-cli serve --model openai/gpt-oss-20b`。 - **为什么安装兼容性重要?**:README 警告 vLLM 含预编译 CUDA kernels,必须与 PyTorch 版本匹配。 ## Source & Thanks > Source: https://github.com/Chen-zexi/vllm-cli > License: MIT > GitHub stars: 493 · forks: 28 --- Source: https://tokrepo.com/en/workflows/vllm-cli-vllm-model-serving-cli-python Author: Script Depot