Cette page est affichée en anglais. Une traduction française est en cours.

SkillsMar 31, 2026·2 min de lecture

MLX — Apple Silicon ML Framework

MLX is an array framework for machine learning on Apple silicon by Apple Research. 24.9K+ GitHub stars. NumPy-like API, unified memory, lazy computation, autodiff. Python, C++, Swift. MIT licensed.

Script Depot · Community

Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Single

Confiance

Confiance : Established

Point d'entrée

MLX — Apple Silicon ML Framework

Commande d'installation directe

npx -y tokrepo@latest install 26aa1d66-a2da-4be3-9c4e-7c33a4e3c398 --target codex

À exécuter après confirmation du plan en dry-run.

TL;DR

MLX provides a NumPy-like API for ML on Apple silicon with unified memory and lazy computation.

§01

What it is

MLX is an array framework for machine learning on Apple silicon, developed by Apple machine learning research. It provides a NumPy-like Python API with composable function transformations (autodiff, vectorization, optimization), lazy computation, dynamic graph construction, and a unified memory model where no manual data transfers between CPU and GPU are needed.

MLX supports Python, C++, C, and Swift, making it the go-to framework for training and inference on Mac hardware (M1/M2/M3/M4).

§02

How it saves time or tokens

MLX takes advantage of Apple silicon's unified memory architecture, eliminating the CPU-to-GPU data transfer overhead that plagues CUDA-based frameworks on Macs. Operations are evaluated lazily, meaning computations run only when results are needed, reducing unnecessary work. The mlx-lm companion package provides turnkey LLM inference with community-quantized models, letting you run large language models locally on a Mac in minutes.

§03

How to use

Install MLX:

pip install mlx

Run a quick GPU computation:

import mlx.core as mx

a = mx.random.normal((512, 512))
b = mx.random.normal((512, 512))
c = a @ b
mx.eval(c)
print(f'Result shape: {c.shape}')

Run LLM inference with mlx-lm:

pip install mlx-lm
mlx_lm.generate \
  --model mlx-community/Llama-3.2-3B-Instruct-4bit \
  --prompt 'Explain transformers in 3 sentences'

§04

Example

Training a simple neural network with MLX:

import mlx.core as mx
import mlx.nn as nn
import mlx.optimizers as optim

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)

    def __call__(self, x):
        x = nn.relu(self.fc1(x))
        return self.fc2(x)

model = MLP()
optimizer = optim.Adam(learning_rate=1e-3)

def loss_fn(model, x, y):
    return nn.losses.cross_entropy(
        model(x), y
    ).mean()

loss_and_grad = nn.value_and_grad(model, loss_fn)

for batch in data_loader:
    loss, grads = loss_and_grad(model, batch['x'], batch['y'])
    optimizer.update(model, grads)
    mx.eval(model.parameters(), optimizer.state)

§05

Related on TokRepo

Local LLM tools — Browse local LLM tools and frameworks on TokRepo.
AI tools for coding — Developer tools for ML and AI.

§06

Common pitfalls

MLX only runs efficiently on Apple silicon. On Intel Macs or Linux, use PyTorch with CUDA instead.
Forgetting to call mx.eval() means computations are deferred and never executed. MLX uses lazy evaluation; results are only computed when explicitly evaluated.
Community-quantized models on Hugging Face vary in quality. Stick to the mlx-community organization for well-tested quantizations.

Questions fréquentes

Does MLX work on Intel Macs?+

MLX technically runs on Intel Macs but without GPU acceleration, it is significantly slower. MLX is designed for Apple silicon (M1/M2/M3/M4) where it can use the GPU via Metal.

How does MLX compare to PyTorch?+

MLX has a similar API to PyTorch and NumPy but is optimized for Apple silicon's unified memory. PyTorch is more mature, has a larger ecosystem, and supports CUDA GPUs. Use MLX for local Mac development and PyTorch for cloud/NVIDIA workflows.

Can I run LLMs locally with MLX?+

Yes. The mlx-lm package provides inference for quantized models from Hugging Face. Run models like Llama, Mistral, and Phi locally on your Mac with a single command.

What is unified memory in MLX?+

Apple silicon shares memory between CPU and GPU. MLX takes advantage of this by avoiding data copies between devices. Arrays can be used on both CPU and GPU without explicit transfers, which is a significant performance advantage.

Does MLX support training, not just inference?+

Yes. MLX provides autodiff (automatic differentiation), optimizers (Adam, SGD), and neural network modules (Linear, Conv, Attention). You can train models from scratch or fine-tune pre-trained models.

Sources citées (3)

MLX GitHub— MLX array framework by Apple Research
MLX Documentation— MLX documentation and examples
mlx-lm GitHub— mlx-lm for LLM inference on Apple silicon

En lien sur TokRepo

Local LLM tools Coding tools MLX deep-dive

🙏

Source et remerciements

Created by Apple ML Research. Licensed under MIT. ml-explore/mlx — 24,900+ GitHub stars

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

llama.cpp — Run LLMs Locally in Pure C/C++

llama.cpp is a C/C++ LLM inference engine with 100K+ GitHub stars. Runs on CPU, Apple Silicon, NVIDIA, AMD GPUs. 1.5-8 bit quantization, no dependencies, supports 50+ model architectures. MIT licensed

Skills

Script Depot

whisper.cpp — Local Speech-to-Text in Pure C/C++

High-performance port of OpenAI Whisper in C/C++. No Python, no GPU required. Runs on CPU, Apple Silicon, CUDA, and even Raspberry Pi. Real-time transcription.

代码Skills

Script Depot

CoreML Tools — Convert ML Models to Apple CoreML Format

A Python package for converting trained models from TensorFlow, PyTorch, and other frameworks into Apple's CoreML format for on-device inference on iPhone, iPad, and Mac.

Scripts

Script Depot

tinygrad — Minimalist Deep Learning Framework

tinygrad is a minimalist deep learning framework in under 10,000 lines of code. It provides a simple, hackable tensor library with automatic differentiation and multi-backend support spanning CPU, GPU, Apple Metal, and custom accelerators.

Skills

AI Open Source