How do I install Transformers.js — Run Hugging Face Models in the Browser?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Transformers.js — Run Hugging Face Models in the Browser

Introduction

Transformers.js brings the Hugging Face ecosystem to JavaScript. It runs ONNX-converted models directly in the browser via WebAssembly or WebGPU, and in Node.js without requiring Python or a GPU server. The API intentionally mirrors the Python transformers library for a familiar developer experience.

What Transformers.js Does

Runs 100+ pretrained models for NLP, vision, and audio tasks in JavaScript
Executes inference locally using ONNX Runtime Web (WebAssembly or WebGPU backend)
Supports pipelines for classification, generation, translation, summarization, and embeddings
Downloads and caches model weights automatically from the Hugging Face Hub
Works in browsers, Node.js, Deno, and edge runtimes like Cloudflare Workers

Architecture Overview

Models are exported from PyTorch to ONNX format and optionally quantized to reduce size. At runtime, Transformers.js loads the ONNX graph into ONNX Runtime Web, which dispatches operations to WebAssembly SIMD or WebGPU compute shaders depending on availability. Tokenizers run natively in JavaScript with full compatibility to their Python counterparts.

Self-Hosting & Configuration

Install via npm: npm install @huggingface/transformers
Models download from Hugging Face Hub on first use and are cached in browser IndexedDB or filesystem
Use quantized models (int8/uint8) to reduce download size by 4x
Configure model loading with custom cache directories or self-hosted model repositories
Enable WebGPU backend for GPU-accelerated inference on supported browsers

Key Features

API parity with Python's transformers library for easy porting of ML pipelines
WebGPU acceleration delivers near-native inference speed in the browser
Supports text, image, and audio models including whisper, CLIP, and Stable Diffusion
Quantized model variants reduce download from hundreds of MB to tens of MB
Zero server cost: all inference runs client-side with full user privacy

Comparison with Similar Tools

TensorFlow.js — general ML framework for JS; Transformers.js focuses specifically on Hugging Face model compatibility
ONNX Runtime Node — backend engine; Transformers.js adds the high-level pipeline API and model hub integration
MediaPipe — Google's on-device ML; limited to pre-built tasks rather than arbitrary HF models
Web LLM — focused on large language model chat; Transformers.js covers NLP, vision, and audio equally

FAQ

Q: Which browsers support WebGPU acceleration? A: Chrome 113+, Edge 113+, and Firefox Nightly. Safari support is in development. WebAssembly fallback works everywhere.

Q: Can I run large language models in the browser? A: Yes, for smaller models (up to ~1B parameters quantized). Larger models benefit from WebGPU but may exceed browser memory limits.

Q: How do I convert my own model? A: Use the optimum library: optimum-cli export onnx --model my-model ./output, then load the ONNX output in Transformers.js.

Q: Is there a size limit for browser deployment? A: Practically, keep total model weights under 500 MB for good user experience. Quantized small models (30-200 MB) work best.

Transformers.js — Run Hugging Face Models in the Browser

这个资产可以被 Agent 直接读取和安装

Introduction

What Transformers.js Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

讨论

相关资产

Hugging Face Transformers — The Universal Library for Pretrained Models

Hugging Face Tokenizers — Fast Text Tokenization for ML Pipelines

Text Embeddings Inference — High-Performance Embedding Server by Hugging Face

Replicate — Run AI Models via Simple API Calls