Main
Start in interactive mode (
vllm-cli) when setting up GPUs/profiles, then switch to command-line mode for repeatable automation runs.Use built-in profiles and shortcuts to codify serving parameters; README shows
serve --shortcutand hardware-optimized GPT-OSS profiles.Treat vLLM install as a separate compatibility step: README warns CUDA kernels must match PyTorch versions and vLLM-CLI won’t install vLLM by default.
Source-backed notes
- README documents Python 3.9+ support and multiple install options including
pip install vllm-cliandpip install vllm-cli[vllm]. - README includes a basic usage snippet:
vllm-cli serve --model openai/gpt-oss-20b. - README notes vLLM binary compatibility concerns and recommends uv/conda-style installs for PyTorch/CUDA alignment.
FAQ
- Does vllm-cli install vLLM for me?: Not by default — README says vLLM-CLI will not install vLLM or PyTorch unless you use the extra.
- What is the first serving command to try?: README shows
vllm-cli serve --model openai/gpt-oss-20bas a basic example. - Why does install matter?: README warns vLLM uses pre-compiled CUDA kernels that must match your PyTorch version.