Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMar 31, 2026·2 min de lecture

BentoML — Build AI Model Serving APIs

BentoML builds model inference REST APIs and multi-model serving systems from Python scripts. 8.6K+ GitHub stars. Auto Docker, dynamic batching, any ML framework. Apache 2.0.

Introduction

BentoML is a Python framework for building online serving systems optimized for AI apps and model inference. With 8,600+ GitHub stars and Apache 2.0 license, it turns model inference scripts into production REST APIs using Python type hints, automatically generates Docker containers with dependency management, provides performance optimization through dynamic batching and model parallelism, and supports any ML framework and inference runtime. Deploy to Docker or BentoCloud for production.

Best for: ML engineers deploying models as production APIs with minimal boilerplate Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Frameworks: PyTorch, TensorFlow, HuggingFace, ONNX, XGBoost, any runtime


Key Features

  • Python-first: Type hints auto-generate REST API schema
  • Auto Docker: One command to containerize with all dependencies
  • Dynamic batching: Automatically batch requests for throughput
  • Model parallelism: Multi-GPU and multi-model serving
  • Any framework: PyTorch, TensorFlow, HuggingFace, ONNX, XGBoost
  • BentoCloud: Managed deployment with auto-scaling

FAQ

Q: What is BentoML? A: BentoML is a Python framework with 8.6K+ stars for turning ML models into production REST APIs. Auto Docker, dynamic batching, any framework. Apache 2.0.

Q: How do I install BentoML? A: pip install -U bentoml. Decorate your class with @bentoml.service, methods with @bentoml.api, then bentoml serve.


🙏

Source et remerciements

Created by BentoML. Licensed under Apache 2.0. bentoml/BentoML — 8,600+ GitHub stars

Discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires