TokenCost — LLM Price Calculator for 400+ Models
Client-side token counting and USD cost estimation for 400+ LLMs. 3 lines of Python to track prompt and completion costs. Supports OpenAI, Anthropic, Mistral, AWS Bedrock. MIT, 2K+ stars.
Instalación con revisión previa
Este activo requiere revisión. El prompt copiado pide dry-run, muestra escrituras y continúa solo tras confirmación.
npx -y tokrepo@latest install 43b26691-33ce-11f1-9bc6-00163e2b0d79 --target codexPrimero dry-run, confirma las escrituras y luego ejecuta este comando.
What it is
TokenCost is a Python library that counts tokens and estimates USD costs for over 400 LLM models. In three lines of code, you can calculate the cost of any prompt or completion across OpenAI, Anthropic, Mistral, AWS Bedrock, and other providers. The library runs entirely client-side, using local tokenizers to count tokens without sending data to any API.
Developers building LLM applications who need to track costs, set budgets, or compare pricing across providers benefit from TokenCost. It is especially useful for applications that route between multiple models and need real-time cost visibility.
How it saves time or tokens
TokenCost eliminates the need to manually look up pricing pages and count tokens for each provider. Instead of maintaining a spreadsheet of model prices, you call one function and get the USD cost. This saves time during development and enables automated cost monitoring in production. Pre-checking prompt costs before sending them to the API prevents budget overruns.
How to use
- Install TokenCost via pip
- Import the cost calculation functions
- Pass your prompt text and model name to get the USD cost
Example
from tokencost import calculate_prompt_cost, calculate_completion_cost
# Calculate prompt cost
prompt = 'Explain quantum computing in simple terms.'
cost = calculate_prompt_cost(prompt, model='gpt-4o')
print(f'Prompt cost: ${cost:.6f}')
# Calculate completion cost
completion = 'Quantum computing uses qubits...'
cost = calculate_completion_cost(completion, model='gpt-4o')
print(f'Completion cost: ${cost:.6f}')
# Compare across models
for model in ['gpt-4o', 'claude-sonnet-4-20250514', 'mistral-large']:
c = calculate_prompt_cost(prompt, model=model)
print(f'{model}: ${c:.6f}')
Related on TokRepo
- AI tools for monitoring — Browse cost monitoring and observability tools
- AI gateway tools — Explore API gateway solutions with built-in cost tracking
Common pitfalls
- Model prices change frequently; update TokenCost regularly to keep the pricing database current
- Token counts are estimates based on local tokenizers; actual API billing may differ slightly due to special tokens
- Some models use different tokenizers; ensure TokenCost supports your specific model variant for accurate counts
Preguntas frecuentes
TokenCost supports 400+ models across OpenAI, Anthropic, Mistral, Cohere, AWS Bedrock, Google Vertex AI, and others. The pricing database is updated with each release to reflect current model prices.
No. TokenCost runs entirely client-side. It uses local tokenizers to count tokens and a bundled pricing database to calculate costs. No data is sent to any external service.
TokenCost uses the same tokenizers as the providers (tiktoken for OpenAI, etc.) for accurate token counts. Pricing is based on published rates. Minor differences may occur due to special tokens or rounding.
Yes. You can pre-calculate the cost of a prompt before sending it to the API and reject requests that exceed a budget threshold. This is useful for preventing runaway costs in production applications.
Yes. TokenCost is open-source under the MIT license. There are no usage fees or restrictions. You can use it in commercial projects without any cost.
Referencias (3)
- TokenCost GitHub— Client-side token counting for 400+ LLM models
- TokenCost PyPI— MIT license, open-source
- tiktoken GitHub— OpenAI tiktoken tokenizer for accurate token counting
Relacionados en TokRepo
Fuente y agradecimientos
Created by AgentOps-AI. Licensed under MIT.
tokencost — ⭐ 2,000+
Thanks to the AgentOps team for making LLM cost tracking simple and accessible.
Discusión
Activos relacionados
WebLLM — Run Large Language Models Directly in the Browser
WebLLM is an MLC project that brings LLM inference to web browsers using WebGPU. It runs models like LLaMA, Mistral, and Phi entirely client-side with no server required, enabling private AI chat and text generation from any modern browser.
KoboldCpp — Single-File Local LLM Inference Engine
KoboldCpp is a self-contained local LLM inference engine that runs GGUF models with GPU acceleration on consumer hardware, providing an OpenAI-compatible API and built-in web UI without requiring Python or complex setup.
llm.c — LLM Training in Simple Raw C/CUDA
Train large language models in pure C and CUDA without any deep learning framework. Created by Andrej Karpathy, llm.c demonstrates that GPT-2 training can be expressed in roughly 1,000 lines of C code.
LM Evaluation Harness — Unified LLM Benchmarking Framework
EleutherAI's framework for reproducible evaluation of language models across hundreds of benchmarks, providing the standard evaluation backend used by the Open LLM Leaderboard and research papers.