Is BentoML — Build AI Model Serving APIs free to use?

Yes. BentoML — Build AI Model Serving APIs is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install BentoML — Build AI Model Serving APIs?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Scripts2026年4月1日·1 分钟阅读

BentoML — Build AI Model Serving APIs

Name: BentoML — Build AI Model Serving APIs
Author: TokRepo精选

BentoML builds model inference REST APIs and multi-model serving systems from Python scripts. 8.6K+ GitHub stars. Auto Docker, dynamic batching, any ML framework. Apache 2.0.

TokRepo精选 · Community

快速使用

先拿来用，再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

# Install
pip install -U bentoml

# Create a service (service.py)
cat > service.py << 'EOF'
import bentoml

@bentoml.service
class Summarizer:
    def __init__(self):
        from transformers import pipeline
        self.pipeline = pipeline("summarization")

    @bentoml.api
    def summarize(self, text: str) -> str:
        result = self.pipeline(text, max_length=100)
        return result[0]["summary_text"]
EOF

# Serve locally
bentoml serve service:Summarizer

# Build Docker container
bentoml build && bentoml containerize summarizer:latest

介绍

BentoML is a Python framework for building online serving systems optimized for AI apps and model inference. With 8,600+ GitHub stars and Apache 2.0 license, it turns model inference scripts into production REST APIs using Python type hints, automatically generates Docker containers with dependency management, provides performance optimization through dynamic batching and model parallelism, and supports any ML framework and inference runtime. Deploy to Docker or BentoCloud for production.

Best for: ML engineers deploying models as production APIs with minimal boilerplate Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Frameworks: PyTorch, TensorFlow, HuggingFace, ONNX, XGBoost, any runtime

Key Features

Python-first: Type hints auto-generate REST API schema
Auto Docker: One command to containerize with all dependencies
Dynamic batching: Automatically batch requests for throughput
Model parallelism: Multi-GPU and multi-model serving
Any framework: PyTorch, TensorFlow, HuggingFace, ONNX, XGBoost
BentoCloud: Managed deployment with auto-scaling

FAQ

Q: What is BentoML? A: BentoML is a Python framework with 8.6K+ stars for turning ML models into production REST APIs. Auto Docker, dynamic batching, any framework. Apache 2.0.

Q: How do I install BentoML? A: pip install -U bentoml. Decorate your class with @bentoml.service, methods with @bentoml.api, then bentoml serve.

🙏

来源与感谢

Created by BentoML. Licensed under Apache 2.0. bentoml/BentoML — 8,600+ GitHub stars

BentoML — Build AI Model Serving APIs

先拿来用，再决定要不要深挖

Key Features

FAQ

来源与感谢

相关资产

Windmill — Open-Source Internal Tool Platform

Agno — Production AI Agent Runtime

Semantic Kernel — Microsoft AI Agent Framework