Introduction
DeepReasoning is a Rust-based inference proxy that combines DeepSeek R1 chain-of-thought reasoning with Anthropic Claude model outputs. It exposes an OpenAI-compatible API and a chat UI, enabling applications to benefit from explicit reasoning traces while leveraging Claude for final responses.
What DeepReasoning Does
- Proxies inference requests through DeepSeek R1 for chain-of-thought reasoning and Claude for final output
- Exposes an OpenAI-compatible REST API for drop-in integration
- Provides a web-based chat UI that displays reasoning traces alongside responses
- Streams both reasoning steps and final answers in real-time
- Supports configurable model selection for each stage of the pipeline
Architecture Overview
DeepReasoning runs a Rust HTTP server built on Axum. Incoming requests are first sent to DeepSeek R1 to generate a chain-of-thought reasoning trace. The trace is then included as context in a follow-up request to Claude, which produces the final response. Both stages stream tokens to the client as they arrive. The chat UI is a bundled frontend that renders reasoning traces in collapsible sections.
Self-Hosting & Configuration
- Clone and build with Cargo (requires Rust 1.70+)
- Set DEEPSEEK_API_KEY and ANTHROPIC_API_KEY as environment variables
- Configure model versions, temperature, and max tokens via config.toml
- Runs on localhost by default; configurable port and host binding
- Docker image available for containerized deployment
Key Features
- Combines chain-of-thought reasoning from DeepSeek R1 with Claude output quality
- OpenAI-compatible API allows integration with existing tools and libraries
- Rust implementation handles concurrent requests with minimal resource usage
- Web UI displays reasoning traces for transparency and debugging
- Supports streaming for both reasoning and response phases
Comparison with Similar Tools
- OpenRouter — multi-model API gateway; DeepReasoning specifically chains reasoning and response across two models
- LiteLLM — unified LLM proxy; DeepReasoning adds a two-stage reasoning pipeline
- Portkey — LLM gateway with caching; DeepReasoning focuses on CoT integration rather than routing
- Jan — offline AI desktop app; DeepReasoning is an API server for programmatic access
- LobeChat — multi-model chat UI; DeepReasoning provides a specialized reasoning-chain interface
FAQ
Q: Do I need both DeepSeek and Anthropic API keys? A: Yes. The two-stage pipeline requires access to both providers.
Q: Can I use other models instead of DeepSeek R1 or Claude? A: The architecture supports any OpenAI-compatible reasoning model for stage one and any Anthropic model for stage two. Custom providers can be configured.
Q: What is the latency overhead of the two-stage approach? A: Total latency is roughly the sum of both API calls, but streaming mitigates perceived delay since reasoning tokens appear immediately.
Q: Is this suitable for production use? A: The Rust server is production-grade in terms of performance and reliability. Cost and latency depend on the underlying API providers.