Cette page est affichée en anglais. Une traduction française est en cours.

SkillsApr 9, 2026·3 min de lecture

Manifest — Smart LLM Router That Cuts Costs 70%

Intelligent LLM routing that scores requests across 23 dimensions in under 2ms. Routes to the cheapest capable model among 300+ options from 13+ providers. MIT, 4,200+ stars.

AI Open Source · Community

Prêt pour agents

Installation avec revue préalable

Cet actif nécessite une revue. Le prompt copié demande un dry-run, affiche les écritures, puis continue seulement après confirmation.

Needs Confirmation · 66/100Policy : confirmer

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Single

Confiance

Confiance : Established

Point d'entrée

step-1.md

Commande avec revue préalable

npx -y tokrepo@latest install 15266cba-33d7-11f1-9bc6-00163e2b0d79 --target codex

Dry-run d'abord, confirmez les écritures, puis lancez cette commande.

TL;DR

Manifest scores each LLM request across 23 dimensions in under 2ms and routes it to the cheapest capable model.

§01

What it is

Manifest is an intelligent LLM routing layer that sits between your application and LLM providers. It analyzes each incoming request across 23 complexity dimensions and routes it to the cheapest model that can handle it from a pool of 300+ models across 13+ providers including OpenAI, Anthropic, Google, and DeepSeek.

This tool targets teams running production LLM applications who want to reduce API spending without sacrificing output quality. It works as a drop-in proxy that requires minimal code changes.

§02

How it saves time or tokens

Many LLM requests are simple enough for smaller, cheaper models. Manifest identifies these automatically. Simple classification or extraction tasks go to fast models. Complex reasoning tasks go to powerful ones. The routing decision adds under 2ms of latency. Teams report up to 70% cost reduction by avoiding overpowered models for routine requests.

§03

How to use

Install Manifest via OpenClaw plugin or run locally with Docker.
Open the dashboard at http://127.0.0.1:2099 and add your provider API keys.
Point your application to the Manifest proxy endpoint instead of calling providers directly.

# Docker install
docker pull mnfst/manifest
docker run -p 2099:2099 mnfst/manifest

# Or via OpenClaw
openclaw plugins install manifest

§04

Example

Manifest works as a transparent proxy. Your existing API calls route through it:

import openai

# Point to Manifest proxy instead of OpenAI directly
client = openai.OpenAI(
    base_url='http://localhost:2099/v1',
    api_key='your-manifest-key'
)

# Manifest picks the cheapest capable model automatically
response = client.chat.completions.create(
    model='auto',  # let Manifest decide
    messages=[{'role': 'user', 'content': 'Summarize this paragraph...'}]
)

§05

Related on TokRepo

AI Gateway Providers — Compare LiteLLM and other gateway solutions for multi-provider routing
AI Tools for API Management — Tools for managing and optimizing API integrations

§06

Common pitfalls

The 23-dimension scoring is tuned for English text. Non-English requests may get misrouted to weaker models that handle the language poorly.
Budget controls need careful thresholds. Setting cost caps too aggressively can force all traffic to the cheapest models and degrade quality.
Automatic fallbacks can mask provider outages. Monitor per-provider success rates separately to catch issues early.

Questions fréquentes

How does Manifest decide which model to use?+

Manifest scores each request across 23 dimensions including complexity, length, domain, and required capabilities. This scoring happens in under 2ms. It then matches the request profile against known model capabilities and selects the cheapest model that meets the quality threshold.

Does Manifest support streaming responses?+

Yes. Manifest proxies streaming responses from the selected provider transparently. Your application receives Server-Sent Events exactly as it would from the original provider, with no buffering or modification.

What happens if the chosen model fails?+

Manifest includes automatic fallback logic. If the selected model returns an error or times out, it escalates to the next cheapest capable model. You can configure fallback chains and retry policies in the dashboard.

Can I force a specific model for certain requests?+

Yes. You can specify a model name directly instead of using the auto routing. You can also create routing rules in the dashboard that pin specific request patterns to specific models.

Is Manifest open source?+

Yes. Manifest is MIT licensed with 4,200+ GitHub stars. You can self-host it with Docker or run it as an OpenClaw plugin. The source code and documentation are available on GitHub.

Sources citées (3)

Manifest GitHub— Manifest intelligent LLM router with 4,200+ stars
Anthropic Docs— Model routing and cost optimization for LLM applications
OpenAI Pricing— LLM API cost comparison across providers

En lien sur TokRepo

LiteLLM gateway AI API tools OpenRouter

🙏

Source et remerciements

Created by mnfst. Licensed under MIT.

Manifest — ⭐ 4,200+

Thanks to the Manifest team for making LLM cost optimization accessible.

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

LMCache — Supercharge LLM Inference with KV Cache Sharing

LMCache is an open-source KV cache management layer that accelerates LLM inference by sharing and reusing key-value caches across requests, reducing time-to-first-token and GPU memory usage.

Skills

AI Open Source

LiteLLM Router — Smart Failover & Load Balancing in Python

LiteLLM Router routes LLM endpoints with retry, fallback, latency-based, weighted A/B. Pure Python — drop into any codebase, no separate proxy needed.

Scripts

LiteLLM (BerriAI)

TokenCost — LLM Price Calculator for 400+ Models

Client-side token counting and USD cost estimation for 400+ LLMs. 3 lines of Python to track prompt and completion costs. Supports OpenAI, Anthropic, Mistral, AWS Bedrock. MIT, 2K+ stars.

Skills

Script Depot

Unsloth — 2x Faster Local LLM Training & Inference

Unsloth is a unified local interface for running and training AI models. 58.7K+ GitHub stars. 2x faster training with 70% less VRAM across 500+ models including Qwen, DeepSeek, Llama, Gemma. Web UI wi

Skills

AI Open Source