Configs2026年5月24日·1 分钟阅读

Transformers.js — Run Hugging Face Models in the Browser

A JavaScript library that brings state-of-the-art machine learning models to the browser and Node.js with an API mirroring Python's Transformers library.

Agent 就绪

这个资产可以被 Agent 直接读取和安装

TokRepo 同时提供通用 CLI 命令、安装契约、metadata JSON、按适配器生成的安装计划和原始内容链接,方便 Agent 判断适配度、风险和下一步动作。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
Transformers.js Overview
通用 CLI 安装命令
npx tokrepo install 94bb93e5-578c-11f1-9bc6-00163e2b0d79

Introduction

Transformers.js brings the Hugging Face ecosystem to JavaScript. It runs ONNX-converted models directly in the browser via WebAssembly or WebGPU, and in Node.js without requiring Python or a GPU server. The API intentionally mirrors the Python transformers library for a familiar developer experience.

What Transformers.js Does

  • Runs 100+ pretrained models for NLP, vision, and audio tasks in JavaScript
  • Executes inference locally using ONNX Runtime Web (WebAssembly or WebGPU backend)
  • Supports pipelines for classification, generation, translation, summarization, and embeddings
  • Downloads and caches model weights automatically from the Hugging Face Hub
  • Works in browsers, Node.js, Deno, and edge runtimes like Cloudflare Workers

Architecture Overview

Models are exported from PyTorch to ONNX format and optionally quantized to reduce size. At runtime, Transformers.js loads the ONNX graph into ONNX Runtime Web, which dispatches operations to WebAssembly SIMD or WebGPU compute shaders depending on availability. Tokenizers run natively in JavaScript with full compatibility to their Python counterparts.

Self-Hosting & Configuration

  • Install via npm: npm install @huggingface/transformers
  • Models download from Hugging Face Hub on first use and are cached in browser IndexedDB or filesystem
  • Use quantized models (int8/uint8) to reduce download size by 4x
  • Configure model loading with custom cache directories or self-hosted model repositories
  • Enable WebGPU backend for GPU-accelerated inference on supported browsers

Key Features

  • API parity with Python's transformers library for easy porting of ML pipelines
  • WebGPU acceleration delivers near-native inference speed in the browser
  • Supports text, image, and audio models including whisper, CLIP, and Stable Diffusion
  • Quantized model variants reduce download from hundreds of MB to tens of MB
  • Zero server cost: all inference runs client-side with full user privacy

Comparison with Similar Tools

  • TensorFlow.js — general ML framework for JS; Transformers.js focuses specifically on Hugging Face model compatibility
  • ONNX Runtime Node — backend engine; Transformers.js adds the high-level pipeline API and model hub integration
  • MediaPipe — Google's on-device ML; limited to pre-built tasks rather than arbitrary HF models
  • Web LLM — focused on large language model chat; Transformers.js covers NLP, vision, and audio equally

FAQ

Q: Which browsers support WebGPU acceleration? A: Chrome 113+, Edge 113+, and Firefox Nightly. Safari support is in development. WebAssembly fallback works everywhere.

Q: Can I run large language models in the browser? A: Yes, for smaller models (up to ~1B parameters quantized). Larger models benefit from WebGPU but may exceed browser memory limits.

Q: How do I convert my own model? A: Use the optimum library: optimum-cli export onnx --model my-model ./output, then load the ONNX output in Transformers.js.

Q: Is there a size limit for browser deployment? A: Practically, keep total model weights under 500 MB for good user experience. Quantized small models (30-200 MB) work best.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产