[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"pack-detail-i18n-translation-pipeline-scale-en":3,"seo:pack:i18n-translation-pipeline-scale:en":97},{"code":4,"message":5,"data":6},200,"操作成功",{"pack":7},{"slug":8,"icon":9,"tone":10,"status":11,"status_label":12,"title":13,"description":14,"items":15,"install_cmd":96},"i18n-translation-pipeline-scale","🌍","#22D3EE","new","New · this week","i18n Translation Pipeline at Scale","Ten picks for the app team shipping to 10+ languages and tired of paying per-string SaaS prices. CI-driven pipeline: extract keys via Weblate\u002FTolgee → AI translate with OpenAI SDK or self-hosted Transformers\u002FLibreTranslate → glossary check with Vale → grammar QA with LanguageTool → spell-check with typos → reinject. pre-commit and markdownlint hold the line on every PR.",[16,28,35,42,49,57,64,72,82,89],{"id":17,"uuid":18,"slug":19,"title":20,"description":21,"author_name":22,"view_count":23,"vote_count":24,"lang_type":25,"type":26,"type_label":27},1776,"cb2ceff8-3bca-11f1-9bc6-00163e2b0d79","weblate-web-based-continuous-localization-platform-cb2ceff8","Weblate — Web-Based Continuous Localization Platform","A web-based translation management system with tight version control integration. Weblate automates the localization workflow with translation memory, machine translation, and quality checks.","AI Open Source",120,0,"en","skill","Skill",{"id":29,"uuid":30,"slug":31,"title":32,"description":33,"author_name":22,"view_count":34,"vote_count":24,"lang_type":25,"type":26,"type_label":27},2716,"5b96a366-48e2-11f1-9bc6-00163e2b0d79","tolgee-developer-friendly-localization-platform-5b96a366","Tolgee — Developer-Friendly Localization Platform","An open-source localization platform that lets developers and translators manage translations through a web UI, in-context editing, and native SDK integrations for React, Vue, Angular, and more.",112,{"id":36,"uuid":37,"slug":38,"title":39,"description":40,"author_name":22,"view_count":41,"vote_count":24,"lang_type":25,"type":26,"type_label":27},1336,"3109a712-381e-11f1-9bc6-00163e2b0d79","libretranslate-self-hosted-translation-api-no-rate-limits-3109a712","LibreTranslate — Self-Hosted Translation API with No Rate Limits","LibreTranslate is a self-hostable translation API powered by open-source Argos Translate models. No API keys, no rate limits, no data sent to third parties — a drop-in replacement for Google Translate when privacy matters.",212,{"id":43,"uuid":44,"slug":45,"title":46,"description":47,"author_name":48,"view_count":23,"vote_count":24,"lang_type":25,"type":26,"type_label":27},1298,"b0920ac9-37db-11f1-9bc6-00163e2b0d79","hugging-face-transformers-universal-library-pretrained-b0920ac9","Hugging Face Transformers — The Universal Library for Pretrained Models","transformers is the de-facto Python library for using and fine-tuning pretrained models — BERT, GPT, Llama, Whisper, ViT, and 250,000+ others. One unified API works across PyTorch, TensorFlow, and JAX.","Hugging Face",{"id":50,"uuid":51,"slug":52,"title":53,"description":54,"author_name":55,"view_count":56,"vote_count":24,"lang_type":25,"type":26,"type_label":27},3109,"c0cc4d66-d935-43f1-a394-8222c4c15c31","openai-python-official-openai-python-sdk","openai-python — Official OpenAI Python SDK","Call the OpenAI REST API from Python 3.9+ with typed request\u002Fresponse models and sync\u002Fasync clients. Use it as a core SDK for agents and app backends.","Agent Toolkit",77,{"id":58,"uuid":59,"slug":60,"title":61,"description":62,"author_name":22,"view_count":63,"vote_count":24,"lang_type":25,"type":26,"type_label":27},1865,"13b1fee7-3cf7-11f1-9bc6-00163e2b0d79","vale-syntax-aware-prose-linter-technical-writing-13b1fee7","Vale — Syntax-Aware Prose Linter for Technical Writing","Vale is a command-line tool that enforces writing style guides on your prose, supporting custom rules for documentation teams to ensure consistent terminology, tone, and formatting across Markdown, AsciiDoc, and more.",71,{"id":65,"uuid":66,"slug":67,"title":68,"description":69,"author_name":70,"view_count":71,"vote_count":24,"lang_type":25,"type":26,"type_label":27},2308,"29fd01ff-431d-11f1-9bc6-00163e2b0d79","languagetool-self-hosted-grammar-style-checker-25-languages-29fd01ff","LanguageTool — Self-Hosted Grammar and Style Checker for 25+ Languages","An open-source grammar, style, and spell checker that supports over 25 languages and can be self-hosted as an HTTP API server for private proofreading.","Script Depot",165,{"id":73,"uuid":74,"slug":75,"title":76,"description":77,"author_name":78,"view_count":79,"vote_count":24,"lang_type":25,"type":80,"type_label":81},3030,"43250786-476f-4adc-a40e-751fba4100e4","typos-source-code-spell-checker-for-ci","typos — Source Code Spell Checker for CI","typos catches spelling mistakes in code, docs, config, and comments with low false positives. Run it locally, in pre-commit, or as a CI gate.","crate-ci",9,"script","Script",{"id":83,"uuid":84,"slug":85,"title":86,"description":87,"author_name":70,"view_count":88,"vote_count":24,"lang_type":25,"type":26,"type_label":27},751,"2f24f820-a8de-430f-87d7-945401c6a0e3","markdownlint-lint-markdown-ai-content-quality-2f24f820","Markdownlint — Lint Markdown for AI Content Quality","Node.js markdown linter with 50+ rules. Ensure consistent formatting in CLAUDE.md, .cursorrules, README files, and AI-generated documentation across your project.",180,{"id":90,"uuid":91,"slug":92,"title":93,"description":94,"author_name":70,"view_count":95,"vote_count":24,"lang_type":25,"type":26,"type_label":27},1265,"69a51c48-37b5-11f1-9bc6-00163e2b0d79","pre-commit-framework-managing-git-hook-scripts-69a51c48","pre-commit — A Framework for Managing Git Hook Scripts","pre-commit manages and installs multi-language Git hooks from a YAML file. It runs linters, formatters, and checks before commits reach CI — catching issues early with zero manual setup per developer.",128,"tokrepo install pack\u002Fi18n-translation-pipeline-scale",{"pageType":98,"pageKey":8,"locale":25,"title":99,"metaDescription":100,"h1":101,"tldr":102,"bodyMarkdown":103,"faq":104,"schema":120,"internalLinks":125,"citations":138,"wordCount":151,"generatedAt":152},"pack","i18n Translation Pipeline at Scale — 10 Tools for an Automated CI-Driven Localization Stack","Weblate, Tolgee, LibreTranslate, Transformers, OpenAI SDK, Vale, LanguageTool, typos, markdownlint, pre-commit — the 10-asset stack an app team uses to automate extract → AI translate → glossary check → grammar QA → reinject in CI. Per-PR gates. No per-string SaaS bills.","i18n Translation Pipeline at Scale — 10 Picks for a CI-Driven Localization Stack","Ten picks for the dev team that has to ship in 10+ languages without staffing a localization vendor. The pipeline runs in CI: pre-commit gates each PR, Weblate or Tolgee extracts keys, the OpenAI SDK or a self-hosted Transformers\u002FLibreTranslate model produces translations, then Vale + LanguageTool + typos + markdownlint enforce glossary, grammar, spelling, and markdown shape before anything reinjects. Different from a translator's stack — this is for the team automating the human out of the bulk path.","## What's in this pack\n\nThis is the stack for the **app team shipping in 10+ languages without a dedicated localization vendor** — a backend engineer, a frontend engineer, and a part-time PM who all share the on-call rotation and cannot afford a per-string SaaS bill that scales linearly with the product. The job stops being \"translate strings\" and starts being \"keep translations in sync with main on every PR without a human in the loop for the bulk path\".\n\nIt is not the same job as the one in the [translator's multi-lingual stack](\u002Fen\u002Fpacks\u002Ftranslator-multilingual-stack). That pack is for the human translator and localization engineer running a real pipeline with translators in the loop. **This pack is for the dev team that wants to automate that pipeline end-to-end in CI** and only escalate to a human reviewer when a gate fails. Same five stages — extract, translate, QA, validate, reinject — but the picks change because the operator changes.\n\nThe difference shows up in the tool choices. We keep Weblate and Tolgee because every pipeline still needs a TMS, but we add `pre-commit` (the CI gate orchestrator), `typos` (CI-friendly spell-checker), `markdownlint` (so the translated `.md` files don't break the docs build), the `openai-python` SDK (the LLM caller in your translation script), and `transformers` (so you can fine-tune or host an NMT model on your own GPU without an API bill). The bulk-translate path becomes code, not clicks.\n\n## Install in this order (extract → AI translate → QA gate → reinject)\n\n1. **Weblate** — the TMS that holds the source of truth. Start here because every other tool either feeds it or reads from it. Weblate watches your git repo, extracts strings from gettext\u002Fxliff\u002Fjson\u002FAndroid XML\u002Fproperties, and pushes completed translations back as commits. Self-hosted on Docker, configured against your existing GitHub or GitLab repo, this is the foundation.\n2. **Tolgee** — the developer-friendly alternative. Pick Tolgee instead of Weblate if your reviewers are PMs and designers who need to see strings in-context on the running app (alt-click). Pick Weblate if your reviewers live in PRs. Most teams pick one and stick with it for years; both are listed because the right answer depends on who reviews translations.\n3. **LibreTranslate** — the self-hosted NMT engine for the bulk path. Wire this into Weblate's automatic-suggestion backend so every new string gets a machine translation before a human ever sees it. No per-token cost, no rate limit, no compliance review for sending pre-launch strings to a third-party SaaS. The first 80% of UI strings ship through LibreTranslate without further escalation.\n4. **Hugging Face Transformers** — when LibreTranslate's Argos model isn't fluent enough for your target languages and you need to fine-tune. Load NLLB-200 or M2M-100, fine-tune on your existing translation memory (export it as a TMX file from Weblate), and serve from your own GPU. This is the escape hatch for low-resource languages and post-edit-heavy locales where the off-the-shelf NMT loses fluency.\n5. **openai-python (or any LLM SDK)** — the context-aware translator for strings that have to read like a human wrote them. Marketing copy, error messages users see in their language, onboarding screens. Your translation script reads the source string + screenshot URL + glossary + the last 3 translations of similar strings, builds a prompt, calls the LLM, and writes the result back to Weblate. Always pass the glossary in the prompt. Always.\n6. **Vale** — the terminology gate. Configure a rule pack with your forbidden terms (`login` → `sign in`), brand terms that must never be translated (`Pull Request`, `Slack`), and tone constraints per locale (formal `Sie` in German, informal `tu` in French marketing). Vale runs on every PR via pre-commit. A glossary violation fails the build. No exceptions, no soft warnings.\n7. **LanguageTool** — the grammar and style gate. Run it on the **translated** output, not the source. Catches the silent class of bugs where the translation is grammatically wrong in ways a non-speaker reviewer would never notice — German case, French agreement, Spanish ser\u002Festar, Russian plural forms. Self-hosted as an HTTP API in your CI cluster.\n8. **typos** — the spell-check gate. Rust-fast, ships as a single binary, runs in pre-commit. Catches the `recieve` \u002F `recieved` \u002F `seperator` class of bugs that survive LLM translation because the LLM was trained on the internet, which contains those typos at scale. Configure a per-locale dictionary for product names and you're done.\n9. **markdownlint** — the structure gate for translated docs. When your `README.md` ships in 10 locales, you cannot have one locale's translation silently break the heading hierarchy, mismatch a list indent, or close a code fence in the wrong place. markdownlint catches all three. Run it on every translated `.md` in CI.\n10. **pre-commit** — the orchestrator that wires the four gates together. One `.pre-commit-config.yaml` runs typos + Vale + LanguageTool + markdownlint on every staged file before commit and again in CI. If any gate fails, the commit fails, the PR fails, nothing reinjects. This is the file that turns the stack from \"a pile of tools we ran once\" into \"a pipeline that holds the line on every PR\".\n\n## How they fit together (CI-driven pipeline)\n\n```\n  source content (po \u002F xliff \u002F json \u002F Android XML \u002F md)\n        │\n        ▼\n  ┌──── Weblate (or Tolgee) ─────┐\n  │   extract on git push        │\n  │   ─────────────────────────  │\n  │   present strings via REST   │\n  └──────────────┬───────────────┘\n                 ▼\n     ┌──── translation script ────┐\n     │  for each string:          │\n     │   • lookup translation memory │\n     │   • build prompt (glossary + screenshot) │\n     │   • route by string type:  │\n     │      marketing → OpenAI SDK│\n     │      UI bulk   → LibreTranslate │\n     │      hard lang → Transformers (fine-tuned) │\n     │   • write back to Weblate  │\n     └──────────────┬─────────────┘\n                    ▼\n        ┌──── pre-commit gates ────┐\n        │  Vale       (glossary)   │\n        │  LanguageTool (grammar)  │\n        │  typos      (spelling)   │\n        │  markdownlint (.md shape)│\n        │   ANY fail = PR fails    │\n        └──────────────┬───────────┘\n                       ▼\n           reinject via Weblate commit → git → build\n```\n\nThe gate row is the load-bearing piece. Without `pre-commit` orchestrating the four checkers, glossary drift, grammar bugs, spelling typos, and broken markdown all leak into production on different days through different paths. With it, every PR either passes all four or doesn't merge.\n\n## Tradeoffs you'll hit\n\n- **OpenAI SDK vs self-hosted Transformers vs LibreTranslate** — these three are different cost\u002Fquality\u002Fprivacy points. LLM via OpenAI SDK is highest quality for context-sensitive strings (marketing, errors) and costs cents per thousand strings; LibreTranslate is free and runs in your VPC but loses fluency on low-resource languages; Transformers fine-tuned on your own TM is the escape hatch when neither works. The production pattern: route by string type, not by tool. Marketing copy → LLM. Bulk UI → LibreTranslate. Locales where LibreTranslate is bad → fine-tuned NMT via Transformers.\n- **Weblate vs Tolgee vs SaaS (Lokalise\u002FCrowdin\u002FPhrase)** — SaaS ships faster but locks you in and prices per string. With a 50,000-string app in 12 locales, the math gets unfriendly fast. Weblate is the right default for teams whose reviewers live in PRs; Tolgee is the right default for teams whose reviewers need in-context editing. Stick with SaaS only when integrations you genuinely use justify the bill.\n- **CI gates as soft warnings vs hard failures** — soft warnings get ignored. Hard failures cause occasional drama when a translation is genuinely fine and the gate is wrong. The right answer is hard failures with a documented override path: an engineer adds a `# vale-ignore: TermsCheck` comment with a code review justification, the PR proceeds, the override gets audited weekly. Never run the gates as advisory.\n- **Translation memory ownership** — your TM is more sensitive than your code. It contains every release note before launch, every customer support reply, every legal disclaimer. Self-host the TMS (Weblate or Tolgee) and only send strings to a third-party LLM after PII and embargoed-content redaction. The LibreTranslate + self-hosted Transformers path exists for exactly this reason.\n\n## Common pitfalls\n\n- **Placeholder breakage** — the LLM helpfully translates `{username}` to `{nombreusuario}` and the app crashes on next render. Turn on Weblate's placeholder check; configure your translation script to lock placeholders before the LLM call and substitute them back after.\n- **Forgetting to pass the glossary in the LLM prompt** — the single most common bug in homegrown translation scripts. Without the glossary, the LLM picks a new word for \"workspace\" every call. The fix is one line in the prompt template; do not skip it.\n- **Routing by language instead of by string type** — \"all French goes through LLM\" sounds cheap until your French marketing copy reads like a robot. Route by string type: marketing → LLM, UI bulk → NMT, regardless of language.\n- **Treating CI gates as advisory** — the moment one engineer overrides a Vale failure without code review, the gate is dead. Either it's a hard fail or it's nothing.\n- **Skipping markdownlint on translated docs** — translated `README.md` files break the docs build at 2am because a Spanish translator put a `*` where a `-` was. markdownlint is the cheapest insurance in this pack; turn it on first.\n- **No human-in-the-loop sampler** — a fully-automated pipeline drifts. Sample 1% of merged translations into a manual review queue weekly. The metrics from that sample tell you which gates need tuning and which locales need a Transformers fine-tune.",[105,108,111,114,117],{"q":106,"a":107},"How is this pack different from the Translator's Multi-Lingual Stack?","Different operator, different framing. The translator pack is built for a human localization engineer running a pipeline with translators in the loop — Weblate, glossary owner, post-edit workflow, format-aware tools for PDF and video. This pack is built for the dev team that wants no human in the bulk path: pre-commit orchestrates Vale, LanguageTool, typos, and markdownlint on every PR; the openai-python SDK or self-hosted Transformers do the bulk translation; humans only see strings the gates rejected. Same TMS layer (Weblate, Tolgee, LibreTranslate appear in both because they're the right answer for both jobs), different automation layer.",{"q":109,"a":110},"Why three translation engines instead of one?","Because no single engine is right across the cost-quality-privacy space. OpenAI's API gives you a context-aware LLM that knows {user_name} is a placeholder and that 'trial' in a SaaS app means free-trial, not court case — but you pay per token and send strings to a third party. LibreTranslate runs free in your VPC with no rate limit but is less fluent on low-resource languages. Hugging Face Transformers lets you fine-tune NLLB-200 or M2M-100 on your own translation memory and host it on your own GPU — best for the locales where LibreTranslate is bad and OpenAI is expensive. The production pattern is to route by string type, not by language: marketing through the LLM, bulk UI through LibreTranslate, hard locales through fine-tuned Transformers.",{"q":112,"a":113},"Do I really need both Vale and LanguageTool in the pipeline?","Yes — they catch different bug classes. LanguageTool is a grammar checker: it knows German case agreement, French gender agreement, Spanish ser\u002Festar, Russian plural forms, the things a non-native reviewer would never spot. Vale is a style and terminology linter: it enforces your glossary (never say login, always sign in), brand terms (never translate Pull Request), and tone constraints per locale. LanguageTool catches grammar drift, Vale catches policy drift. Running only one of them leaves the other class of bugs in production. Both are cheap to run in CI.",{"q":115,"a":116},"What's the smallest viable version of this pipeline I can ship this week?","Four picks. Weblate (Docker, one afternoon against your git repo). LibreTranslate (one container, wired as Weblate's MT suggestion backend). pre-commit running typos + markdownlint (one .pre-commit-config.yaml, ten minutes). And the openai-python SDK in a 100-line translation script that reads Weblate's REST API, calls the LLM for any string tagged 'marketing' with the glossary in the prompt, and writes back. That's the v1: bulk pre-translation by NMT, marketing strings by LLM, two CI gates holding the line. Add Vale + LanguageTool the second week to catch what slipped through. Add Transformers fine-tuning only when you can prove a specific locale needs it.",{"q":118,"a":119},"How do I avoid sending sensitive translation memory to a third-party LLM?","Three layers. First, route by string sensitivity: anything tagged `confidential` or `pre-launch` in Weblate routes to LibreTranslate or your self-hosted Transformers, never to the OpenAI API. Second, run a PII redactor before any LLM call — replace user names, emails, customer IDs with placeholders, swap them back after translation. Third, sign a DPA with your LLM provider and document the data flow in your security review. The pack lists LibreTranslate and Transformers ahead of openai-python for exactly this reason: the self-hosted path is the default, the LLM is the escape hatch for the strings where context wins, not the bulk hammer.",{"@context":121,"@type":122,"name":13,"description":123,"numberOfItems":124,"inLanguage":25},"https:\u002F\u002Fschema.org","ItemList","Ten picks for an app team automating its localization pipeline end-to-end in CI: TMS, three translation engines (LLM + NMT + fine-tunable Transformers), four pre-commit gates for glossary, grammar, spelling, and markdown structure.",10,[126,130,134],{"url":127,"anchor":128,"reason":129},"\u002Fen\u002Fpacks\u002Ftranslator-multilingual-stack","Translator's Multi-Lingual Stack","Companion pack for the human localization engineer running the same pipeline with translators in the loop",{"url":131,"anchor":132,"reason":133},"\u002Fen\u002Fai-tools-for\u002Ftranslation","AI translation tools on TokRepo","The broader catalog of translation and multilingual NLP assets",{"url":135,"anchor":136,"reason":137},"\u002Fen\u002Fai-tools-for\u002Fautomation","Automation tools for AI agents","pre-commit and the CI gate orchestration carry over to any agent-driven CI pipeline",[139,143,147],{"claim":140,"source_name":141,"source_url":142},"Weblate is a web-based continuous localization platform with version control integration","Weblate documentation","https:\u002F\u002Fdocs.weblate.org\u002F",{"claim":144,"source_name":145,"source_url":146},"pre-commit is a framework for managing multi-language git hook scripts from a YAML file","pre-commit official site","https:\u002F\u002Fpre-commit.com\u002F",{"claim":148,"source_name":149,"source_url":150},"Hugging Face Transformers supports machine translation models including NLLB-200 and M2M-100","Transformers documentation","https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers",1340,"2026-05-22T15:00:00Z"]