TOKREPO · ARSENAL

Stable

China AI API Alternatives — OpenAI / Claude Replacements You Can Pay in RMB

Ten Chinese-origin AI APIs and gateways: DeepSeek-V3 / R1 / Coder, Qwen, ChatGLM, Kimi (Moonshot), MiniMax, plus One API / LiteLLM Proxy / Cherry Studio for routing and failover. RMB billing, fapiao-friendly, OpenAI-SDK compatible, no VPN required.

10 assets

About this pack

What this pack actually solves

A team in mainland China shipping an AI feature hits the same three walls. (1) The OpenAI / Anthropic billing flow needs a foreign card and a foreign-IP signup — practical for a side project, painful at company scale. (2) Outbound API calls to api.openai.com / api.anthropic.com are unreliable from inside China without a proxy, and a corporate proxy raises its own compliance questions. (3) Finance wants a fapiao (增值税发票) for the spend, which foreign SaaS does not issue.

This pack picks ten assets that close all three gaps without giving up GPT-4o-class quality. Three tiers: SOTA frontier (DeepSeek-V3, R1, Coder), Chinese-vendor alternates (Qwen, ChatGLM, Kimi, MiniMax), and gateways that hide vendor choice behind one OpenAI-compatible endpoint (One API, LiteLLM Proxy, Cherry Studio). End state: your code stays unchanged — base_url and model are the only diffs — and finance gets a fapiao every month.

Install in this order — free / cheap first, then quality, then routing

DeepSeek-V3 — Open-Weight 671B MoE with GPT-4o Quality (id 2832). Start here. The hosted API is $0.27 / 1M input tokens — roughly 10× cheaper than GPT-4o — billed in RMB at platform.deepseek.com. OpenAI-compatible: switch base_url to https://api.deepseek.com/v1, set model="deepseek-chat", ship. Weights are MIT-licensed if you ever want to bring it in-house.
Qwen Code — Terminal Coding Agent for Qwen Models (id 3022). Alibaba Cloud DashScope hosts Qwen2.5 / Qwen3 with OpenAI-compatible endpoints and RMB billing. Qwen Code is the CLI wrapper most domestic devs use to wire Qwen into a coding loop without writing the SDK glue themselves. Pair the API with this CLI on day one.
ChatGLM — Open Bilingual Chat Model by Tsinghua KEG (id 2264). The classic Zhipu / Tsinghua bilingual line. The hosted GLM-4 series on bigmodel.cn ships in RMB, supports fapiao, and remains the strongest non-DeepSeek choice when the workload is bilingual customer-facing — translation, support, content moderation in mixed zh/en.
oh-my-kimi — Evidence-Gated Agent Runtime for Kimi (id 3643). Moonshot's Kimi line is the long-context champion in the domestic market (200K+ tokens, document-grade comprehension). Wire it for the "feed in the whole contract / RFP / manual" workload that GPT-4o handles uneconomically. oh-my-kimi adds the agent-runtime layer so the model is not just a chat box.
MiniMax-MCP — Official MiniMax MCP Server (id 3932). MiniMax's strength is voice and multi-modal (TTS, voice cloning, video-to-text). The official MCP server gives any MCP-aware client (Claude Desktop, Cursor, Cline) a tool surface for those models. Install this once you need anything beyond text.
DeepSeek-R1 — Open-Weight Reasoning Model Rivaling OpenAI o1 (id 2833). The reasoning sibling of V3. Same API surface, different model alias (deepseek-reasoner). Route hard reasoning calls here, route everything else to V3 — the gateway in step 8 makes that one config line.
DeepSeek Coder — Code-Specialized Model for Local Inference (id 2834). The smaller, code-specialized branch — runs locally on a workstation GPU and removes the network hop entirely for code completion or in-IDE refactor. Belongs on a developer's laptop, not in your production gateway.
One API — Unified LLM API Gateway (Docker) (id 3821). The single most important install in this pack. One API is the open-source, China-friendly OpenAI-compatible gateway — it speaks the OpenAI request/response format and routes calls to DeepSeek, Qwen, ChatGLM, Kimi, MiniMax, Azure, Anthropic, and dozens more. Run it in Docker on your own VPC, point your code at it, and switching vendors becomes a config row, not a code change. RMB billing for downstream providers is preserved.
LiteLLM Proxy — Unified Gateway for 100+ LLM APIs (id 2789). The Western-built equivalent. Use it instead of One API if your team already speaks Python, wants per-key cost tracking out of the box, or needs the more mature failover and rate-limit logic. Same shape — one OpenAI-compatible endpoint, many providers behind it.
Cherry Studio Custom Models — BYOK Any LLM Provider (id 2821). The desktop client. Before you wire any of the above into production, paste your keys into Cherry Studio and sanity-check that the model actually answers in the way your app expects. It is the cheapest way to compare DeepSeek vs Qwen vs Kimi on real prompts before you commit a routing decision.

How they fit together

          [ Frontier tier ]              [ Vendor-alternate tier ]      [ Local fallback ]
          DeepSeek-V3 (general)          Qwen (bilingual / Alibaba)     DeepSeek Coder (laptop)
          DeepSeek-R1 (reasoning)        ChatGLM (Zhipu / Tsinghua)
                  │                      Kimi  (long context, Moonshot)
                  │                      MiniMax (voice / multi-modal)
                  │                              │
                  └────────── One API ───────────┤
                              (Docker, OpenAI-compatible)
                                       │
                              LiteLLM Proxy (alt routing layer)
                                       │
                                Your app code
                                       │
                              Cherry Studio (dev sanity check)

The spine is DeepSeek-V3 + One API + LiteLLM Proxy + Cherry Studio. That quartet covers the general workload, hides vendor choice behind one endpoint, gives you a routing layer with cost tracking, and gives a desktop client for manual eyeballing. Qwen / ChatGLM / Kimi / MiniMax / DeepSeek-R1 / Coder are alternate backends you add behind the gateway as the workload demands a different specialty.

Tradeoffs to know before you commit

Context length. Kimi leads (200K+), Qwen and DeepSeek sit in the 64K-128K band for most plans, GLM-4 around 128K. If the workload is "summarize this 80-page PDF," Kimi is the only domestic answer that does not require chunking.
English quality. All five families speak English. DeepSeek and Qwen are closest to GPT-4o on English benchmarks; GLM and Kimi are slightly behind on creative English but ahead on Chinese. Test on your prompts — benchmark averages do not predict your workload.
Pricing units. Most providers price per million tokens in RMB on their dashboard and in USD on the public API page. The numbers are not always one-to-one — sometimes the RMB rate is cheaper, sometimes not. Always log usage to your own ledger; do not trust the dashboard alone for cost reconciliation.
Rate limits and concurrency. Frontier models on a new account often start at 60 RPM / 1 concurrent request. Production workloads need a tier-up application — sometimes a phone call. Budget a week of lead time before launching anything that needs > 10 RPS.
Reasoning model latency. R1, GLM-4-Reasoning, Qwen-QwQ trade speed for quality. A reasoning call can take 30-90 seconds — wire a streaming response or a job queue, do not let it block a synchronous request.

Common pitfalls

Hardcoding the vendor SDK. If you import from openai import OpenAI everywhere and only switch base_url, you stay portable. If you import dashscope or zhipuai directly, you are now coupled to one vendor and cannot route through a gateway. Use the OpenAI SDK against the OpenAI-compatible endpoint every vendor publishes.
Treating ICP / 备案 like an API concern. ICP filing is for the public-facing site hosted in China, not for the API call itself. You do not need a 备案 to call DeepSeek's API from a server outside China. You do need a 备案 if your domain is *.cn and serves users in China. Keep the two questions separate.
Forgetting that token counting differs. OpenAI tokenization (tiktoken / cl100k) does not match DeepSeek / Qwen / GLM tokenization. A 1000-character Chinese prompt that is 350 OpenAI tokens may be 280 DeepSeek tokens. Cost projections built on tiktoken will be off by 10-30% — always count with the vendor's own tokenizer for the budget that matters.
Running tests against the production endpoint with the prod key. Every vendor in this pack has a sandbox / test key. Use it. Otherwise the first time your CI loops gets you a $200 surprise from a Qwen-Max test.
Skipping the fapiao setup. The major vendors (DeepSeek, Alibaba Cloud, Zhipu, Moonshot, MiniMax) all support fapiao but each has a different form. Get finance to file the request the same week you sign up — the first month's invoice cycle takes 30 days, and they will not back-issue.
Assuming deepseek-chat is always V3. Vendor aliases change. deepseek-chat points to the current chat flagship; today that is V3, tomorrow it could be V3.5 / V4. If your eval depends on a specific weight, pin the explicit version string, not the alias.

INSTALL · ONE COMMAND

$ tokrepo install pack/china-ai-api-alternatives

hand it to your agent — or paste it in your terminal

What's inside

10 assets in this pack

Skill#01

DeepSeek-V3 — Open-Weight 671B MoE Model with GPT-4o Quality

DeepSeek-V3 is a 671B-param MoE model (37B active per token). Matches GPT-4o on benchmarks. MIT-licensed weights, $0.27/1M input on the hosted API.

by DeepSeek·204 views

$ tokrepo install deepseek-v3-open-weight-671b-moe-model-with-gpt-4o-quality

Skill#02

DeepSeek-R1 — Open-Weight Reasoning Model Rivaling OpenAI o1

DeepSeek-R1 is the open-weight reasoning model that matches OpenAI o1 on math, code, science benchmarks. Streaming chain-of-thought visible. MIT-licensed.

by DeepSeek·197 views

$ tokrepo install deepseek-r1-open-weight-reasoning-model-rivaling-openai-o1

Skill#03

DeepSeek Coder — Code-Specialized Model for Local Inference

DeepSeek Coder is the code-specialized open-weight model with FIM (fill-in-middle) support. Beats Codestral on HumanEval. Drops into Continue, Aider.

by DeepSeek·241 views

$ tokrepo install deepseek-coder-code-specialized-model-for-local-inference

Script#04

Qwen Code — Terminal Coding Agent for Qwen Models

Qwen Code is an open-source terminal coding agent for Qwen models. Node 22+, npm or Homebrew install, /auth flow, codebase Q&A, refactors, and tests.

by QwenLM·243 views

$ tokrepo install qwen-code-terminal-coding-agent-for-qwen-models

Skill#05

ChatGLM — Open Bilingual Chat Model by Tsinghua KEG

ChatGLM is a family of open bilingual language models from Tsinghua University that support English and Chinese conversation, code generation, and tool use, with variants optimized for consumer GPU deployment.

by Script Depot·220 views

$ tokrepo install chatglm-open-bilingual-chat-model-tsinghua-keg-98bef1e7

Script#06

oh-my-kimi — Evidence-gated Agent Runtime for Kimi

oh-my-kimi (OMK) is a CLI runtime that adds evidence gates and worktree isolation to Kimi Code; verified 69★ and ships `omk init/doctor/chat`.

by Skill Factory·236 views

$ tokrepo install oh-my-kimi-evidence-gated-agent-runtime-for-kimi

MCP#07

MiniMax-MCP — Official MiniMax MCP Server

Official MiniMax Model Context Protocol server exposing media-generation tools (audio, image, video, music); verified 1474★, pushed 2026-05-14.

by MCP Hub·233 views

$ tokrepo install minimax-mcp-official-minimax-mcp-server

Skill#08

One API — Unified LLM API Gateway (Docker)

One API is a self-hosted LLM API gateway: unify OpenAI/Claude/Gemini/DeepSeek endpoints, manage keys, and deploy via Docker in minutes (33.7k★).

by AI Open Source·247 views

$ tokrepo install one-api-unified-llm-api-gateway-docker

Agent#09

LiteLLM Proxy — Unified Gateway for 100+ LLM APIs

LiteLLM Proxy maps 100+ LLM providers (Anthropic, OpenAI, Bedrock, Vertex) to one OpenAI-compatible endpoint. Auth, rate limit, cost track, fallbacks.

by LiteLLM (BerriAI)·289 views

$ tokrepo install litellm-proxy-unified-gateway-for-100-llm-apis

Skill#10

Cherry Studio Custom Models — BYOK Any LLM Provider

Cherry Studio Custom Models adds any OpenAI-compatible endpoint — proxy, local, or third-party. Mix Claude, GPT, Gemini, DeepSeek, Ollama side-by-side.

by Cherry Studio·317 views

$ tokrepo install cherry-studio-custom-models-byok-any-llm-provider

FAQ

Frequently asked questions

DeepSeek vs Qwen — which one do I pick if I only install one?

DeepSeek-V3 for cost-sensitive English-leaning workloads, Qwen for Alibaba-stack integration and bilingual customer-facing work. DeepSeek's API is the cheapest path to GPT-4o-class quality (around $0.27 / 1M input tokens) and the weights are MIT-licensed if you ever need to self-host. Qwen's edge is integration depth — it ships native on Alibaba Cloud DashScope alongside the rest of your stack, has the strongest bilingual answer quality for customer-facing UI, and the Qwen Code CLI gives you a coding loop without writing SDK glue. If the workload is API-only and English-heavy, install DeepSeek first. If you already pay Alibaba for compute / storage and want one vendor for everything, install Qwen first.

Can I get a fapiao (增值税发票) for API spend?

Yes, from every major vendor in this pack, but the workflow differs. DeepSeek issues fapiao monthly from the platform.deepseek.com billing page — submit your company tax ID once, then click 申请发票 each month. Alibaba Cloud (Qwen) issues through the standard Aliyun fapiao portal — fully automated once your account is verified as a 企业账号. Zhipu (ChatGLM) and Moonshot (Kimi) require a one-time email to their finance team with your company info, after which monthly fapiao is automatic. MiniMax is similar. Budget a 30-day lag between first spend and first fapiao — vendors invoice in arrears.

What does enterprise compliance actually need to look at?

Five things. (1) Data residency — confirm the API endpoint is mainland-China hosted; DeepSeek, Alibaba Cloud DashScope, Zhipu, Moonshot, MiniMax all run in mainland China. (2) Prompt / response logging — every vendor's privacy policy says they may log calls for safety review; if your workload includes PII or trade secrets, either redact before sending or negotiate a no-log enterprise plan. (3) Cross-border transfer — if your application sends data from a non-CN user to a CN-hosted API, the PIPL cross-border rules apply; check with counsel. (4) Algorithmic filing (算法备案) — if you ship a public-facing generative AI feature in China, the CAC requires a 算法备案 filing; this is on you, not on the model vendor. (5) Fapiao + contracts — make sure the entity you contract with matches the entity issuing the fapiao; mismatched entities cause finance reconciliation headaches.

How fast are these APIs compared to OpenAI / Anthropic?

For latency from a mainland-China network, all five domestic vendors beat OpenAI / Anthropic by a wide margin — the round-trip stays inside China and avoids the international hop. DeepSeek-V3 and Qwen chat models stream first tokens in 200-500ms typically; GLM-4 and Kimi land in similar territory. Reasoning models (DeepSeek-R1, Qwen-QwQ, GLM-4-Reasoning) are slower — 30-90 seconds for a single answer because they generate internal chain-of-thought before responding — but this is intrinsic to the model class, not a China-vs-foreign issue. Compared to OpenAI / Anthropic from inside China without a proxy, the domestic vendors are not just faster, they are reachable.

Can I keep using the OpenAI SDK or do I have to rewrite everything?

Keep the OpenAI SDK. Every vendor in this pack publishes an OpenAI-compatible endpoint, and One API / LiteLLM Proxy in front gives you one OpenAI-compatible endpoint for all of them. In practice: install the OpenAI SDK, set base_url to your gateway, set model to whatever the gateway routes (deepseek-chat, qwen-max, glm-4, moonshot-v1-128k, etc.), and the rest of your code — chat.completions.create, streaming, tool use, JSON mode — is unchanged. The only thing the OpenAI-compatible layer does not always cover is bleeding-edge OpenAI-only features (the Realtime API, the Assistants API, GPT image edits). For 95% of LLM workloads, the SDK swap is one config line.

12 packs · 80+ hand-picked assets

Browse every curated bundle on the home page

Back to all packs