[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"workflow-openllm-serve-open-source-llms-25799792":3,"seo:featured-workflow:25799792-379e-4d8c-a8c7-85034a2129d9:zh":39,"workflow-related-openllm-serve-open-source-llms-25799792-25799792-379e-4d8c-a8c7-85034a2129d9":87},{"id":4,"uuid":5,"slug":6,"title":7,"description":8,"author_id":9,"author_name":10,"author_avatar":11,"token_estimate":12,"time_saved":12,"model_used":13,"fork_count":12,"vote_count":12,"view_count":14,"parent_id":12,"parent_uuid":13,"lang_type":15,"steps":16,"tags":23,"has_voted":29,"visibility":19,"share_token":13,"is_featured":12,"content_hash":30,"asset_kind":31,"target_tools":32,"install_mode":36,"entrypoint":37,"risk_profile":38,"dependencies":40,"verification":50,"agent_metadata":53,"agent_fit":64,"trust":75,"provenance":83,"created_at":85,"updated_at":86},3108,"25799792-379e-4d8c-a8c7-85034a2129d9","openllm-serve-open-source-llms","OpenLLM — Serve Open-Source LLMs","Serve open-source LLMs with a unified CLI, multiple backends, and production deployment paths. Start with `openllm hello`, then serve a real model.","8a911193-3180-11f1-9bc6-00163e2b0d79","AI Open Source","https:\u002F\u002Ftokrepo.com\u002Fapple-touch-icon.png",0,"",16,"en",[17],{"id":18,"step_order":19,"title":20,"description":13,"prompt_template":21,"variables":13,"depends_on":22,"expected_output":13},3671,1,"Asset","# OpenLLM — Serve Open-Source LLMs\n\n> Serve open-source LLMs with a unified CLI, multiple backends, and production deployment paths. Start with `openllm hello`, then serve a real model.\n\n## Quick Use\n\n1. Install:\n   ```bash\n   pip install openllm\n   ```\n2. Run:\n   ```bash\n   openllm hello\n   ```\n3. Verify:\n   - Run one `openllm serve ...` command for a small model and confirm you can hit the HTTP endpoint locally.\n\n\n---\n\n## Intro\n\nServe open-source LLMs with a unified CLI, multiple backends, and production deployment paths. Start with `openllm hello`, then serve a real model.\n\n- **Best for:** Teams who want a consistent local-to-cloud path for serving open models without hand-rolling inference servers\n- **Works with:** Python, CLI workflows, open model serving (local + container\u002Fcloud patterns per repo docs)\n- **Setup time:** 20 minutes\n\n\n### Quantitative Notes\n\n- Setup time ~20 minutes (pip install + hello + first serve)\n- GitHub stars + forks (verified): see Source & Thanks\n- Start with a small model first, then scale to larger sizes to avoid long downloads\n\n\n---\n\n## Practical Notes\n\nA pragmatic workflow is: validate the runtime with `openllm hello`, then serve a small model locally, write a single health-check endpoint, and finally containerize. Track cold start time and memory usage, and bake model downloads into images only when you accept the tradeoff.\n\n**Safety note:** Do not expose unauthenticated model endpoints on the public internet; add auth, rate limits, and logging.\n\n### FAQ\n\n**Q: Is OpenLLM an inference engine?**\nA: It’s a serving toolkit\u002FCLI that helps you run models using supported backends and deploy patterns.\n\n**Q: Can I use it in Docker\u002FKubernetes?**\nA: Yes. The repo describes container and cloud deployment workflows; start local first.\n\n**Q: How do I pick a model?**\nA: Pick the smallest model that meets quality requirements; measure latency and memory before scaling up.\n\n---\n\n## Source & Thanks\n\n> GitHub: https:\u002F\u002Fgithub.com\u002Fbentoml\u002FOpenLLM\n> Owner avatar: https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F49176046?v=4\n> License (SPDX): Apache-2.0\n> GitHub stars (verified via `api.github.com\u002Frepos\u002Fbentoml\u002FOpenLLM`): 12,318\n> GitHub forks (verified via `api.github.com\u002Frepos\u002Fbentoml\u002FOpenLLM`): 810\n\n\n---\n\n\u003C!-- ZH -->\n\n# OpenLLM——用统一 CLI 部署开源大模型\n\n> 用统一 CLI 部署开源 LLM：支持多种推理后端与更贴近生产的部署路径（本地、容器与云）。先跑 `openllm hello`，再切到真实模型做服务化、健康检查与接口验证，并便于统一管理版本。\n\n## 快速使用\n\n1. 安装：\n   ```bash\n   pip install openllm\n   ```\n2. 运行：\n   ```bash\n   openllm hello\n   ```\n3. 验证：\n   - Run one `openllm serve ...` command for a small model and confirm you can hit the HTTP endpoint locally.\n\n\n---\n\n## 简介\n\n用统一 CLI 部署开源 LLM：支持多种推理后端与更贴近生产的部署路径（本地、容器与云）。先跑 `openllm hello`，再切到真实模型做服务化、健康检查与接口验证，并便于统一管理版本。\n\n- **适合谁（Best for）:** 希望从本地到云端用一致方式部署开源模型、又不想手写推理服务的团队\n- **兼容工具（Works with）:** Python、CLI 工作流、开源模型服务化（本地 + 容器\u002F云部署方式见仓库）\n- **安装时间（Setup time）:** 20 分钟\n\n\n### 量化信息\n\n- 跑通约 20 分钟（pip 安装 + hello + 第一次 serve）\n- GitHub stars + forks（已核验）：见「来源与感谢」\n- 建议先用小模型跑通，再逐步升级更大模型，避免下载\u002F启动时间过长\n\n\n---\n\n## 实战要点\n\n务实的流程：先用 `openllm hello` 验证运行时，再本地 serve 一个小模型，补一个健康检查接口，最后再容器化。重点关注冷启动时间与内存占用；只有在接受镜像体积换取启动速度时，才把模型下载打进镜像。\n\n**安全提示：** 不要把未鉴权的模型接口直接暴露公网；需要配鉴权、限流与日志审计。\n\n### FAQ\n\n**Q: OpenLLM 是推理引擎吗？**\nA: 它更像服务化工具链\u002FCLI：封装后端与部署流程，帮你把模型跑起来并暴露接口。\n\n**Q: 能用于 Docker\u002FK8s 吗？**\nA: 可以。仓库提供容器与云部署流程；建议先本地跑通再上云。\n\n**Q: 模型怎么选？**\nA: 优先选择满足质量要求的最小模型，并先测延迟与内存再扩大规模。\n\n---\n\n## 来源与感谢\n\n> GitHub：https:\u002F\u002Fgithub.com\u002Fbentoml\u002FOpenLLM\n> Owner avatar：https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F49176046?v=4\n> 许可证（SPDX）：Apache-2.0\n> GitHub stars（已通过 `api.github.com\u002Frepos\u002Fbentoml\u002FOpenLLM` 核验）：12,318\n> GitHub forks（已通过 `api.github.com\u002Frepos\u002Fbentoml\u002FOpenLLM` 核验）：810\n","0",[24],{"id":25,"name":26,"slug":27,"icon":28},14,"CLI Tools","cli","🖥️",false,"605088378aa7a530ce53346906dcb0de494442820e03b22412fe904a42ba89ee","cli_tool",[33,34,35],"claude_code","codex","gemini_cli","single","README.md",{"executes_code":29,"modifies_global_config":29,"requires_secrets":39,"uses_absolute_paths":29,"network_access":29},null,{"npm":41,"pip":42,"brew":48,"system":49},[],[43,44,45,46,47],"+","first","hello","openllm","serve)",[],[],{"commands":51,"expected_files":52},[],[20],{"asset_kind":31,"target_tools":54,"install_mode":36,"entrypoint":37,"risk_profile":55,"dependencies":56,"content_hash":30,"verification":61},[33,34,35],{"executes_code":29,"modifies_global_config":29,"requires_secrets":39,"uses_absolute_paths":29,"network_access":29},{"npm":57,"pip":58,"brew":59,"system":60},[],[43,44,45,46,47],[],[],{"commands":62,"expected_files":63},[],[20],{"target":34,"score":65,"status":66,"policy":66,"why":67,"asset_kind":31,"install_mode":36},29,"stage_only",[68,69,70,71,72,73,74],"target_tools includes codex","asset_kind cli_tool","install_mode single","markdown-only","policy stage_only","asset_kind cli_tool is not activated directly for Codex","trust established",{"author_trust_level":76,"verified_publisher":29,"asset_signed_hash":30,"signature_status":77,"install_count":12,"report_count":12,"dangerous_capability_badges":78,"review_status":79,"signals":80},"established","hash_only",[31],"unreviewed",[81,82],"author has published assets","content hash available",{"owner_uuid":9,"owner_name":10,"source_url":84,"content_hash":30,"visibility":19,"created_at":85,"updated_at":86},"https:\u002F\u002Ftokrepo.com\u002Fen\u002Fworkflows\u002Fopenllm-serve-open-source-llms","2026-05-12 03:00:18","2026-05-14 08:16:26",[]]