Practical Notes
- Per README: shows a context “fuel gauge” (default limit 200,000) and saves prompts to Markdown + JSON.
- Useful for regression: compare token usage before/after prompt/tool changes.
- Combine with guardrails: when fuel gauge hits 70–80%, switch to summarization or retrieval mode.
Main
A simple workflow that pays off quickly:
- Run your normal CLI session with Tokentap enabled.
- When usage spikes, open the saved prompt archive and identify the culprit: retrieval payload, tool output, or template bloat.
- Fix one thing at a time (shorten tool output, add truncation, or dedupe context), then measure again.
Treat token usage as a budget: you’ll get better answers by spending tokens on relevant evidence, not repeated boilerplate.
FAQ
Q: Does it require certificates? A: Per README: no—"Zero configuration" and it runs as a local proxy with path-prefix routing for OpenAI-compatible providers.
Q: Can it run with Gemini CLI? A: README notes Gemini CLI is currently blocked by an upstream issue when using OAuth; check the linked issue for status.
Q: What should I store? A: Keep prompt archives in a private directory; they may contain secrets or code. Add redaction if you share logs.