Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMar 31, 2026·2 min de lectura

BentoML — Build AI Model Serving APIs

BentoML builds model inference REST APIs and multi-model serving systems from Python scripts. 8.6K+ GitHub stars. Auto Docker, dynamic batching, any ML framework. Apache 2.0.

Introducción

BentoML is a Python framework for building online serving systems optimized for AI apps and model inference. With 8,600+ GitHub stars and Apache 2.0 license, it turns model inference scripts into production REST APIs using Python type hints, automatically generates Docker containers with dependency management, provides performance optimization through dynamic batching and model parallelism, and supports any ML framework and inference runtime. Deploy to Docker or BentoCloud for production.

Best for: ML engineers deploying models as production APIs with minimal boilerplate Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Frameworks: PyTorch, TensorFlow, HuggingFace, ONNX, XGBoost, any runtime


Key Features

  • Python-first: Type hints auto-generate REST API schema
  • Auto Docker: One command to containerize with all dependencies
  • Dynamic batching: Automatically batch requests for throughput
  • Model parallelism: Multi-GPU and multi-model serving
  • Any framework: PyTorch, TensorFlow, HuggingFace, ONNX, XGBoost
  • BentoCloud: Managed deployment with auto-scaling

FAQ

Q: What is BentoML? A: BentoML is a Python framework with 8.6K+ stars for turning ML models into production REST APIs. Auto Docker, dynamic batching, any framework. Apache 2.0.

Q: How do I install BentoML? A: pip install -U bentoml. Decorate your class with @bentoml.service, methods with @bentoml.api, then bentoml serve.


🙏

Fuente y agradecimientos

Created by BentoML. Licensed under Apache 2.0. bentoml/BentoML — 8,600+ GitHub stars

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados