How do I install Fireworks Fine-Tuning — Serverless LoRA on Llama in 30 min?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Fireworks Fine-Tuning — Serverless LoRA on Llama in 30 min

简介

Fireworks 微调在 Llama 3.x / Qwen 2.5 / Mixtral 上跑无服务器 LoRA —— Firectl CLI 上传 JSONL 训练文件，等 30-60 分钟，微调结果部署在同一 OpenAI 兼容 endpoint，model ID 不一样。不租 GPU、不付闲置托管费。适合 Llama 8B 之上的分类头、指令跟随适配器、领域语气调优、把 GPT-4o 行为蒸馏到便宜底模。任何打 Fireworks 的客户端都能用。装机时间：从 JSONL 到上线模型 30 分钟。

准备训练数据（JSONL）

{"messages":[{"role":"system","content":"把客服工单分流成 urgent / billing / general。"},{"role":"user","content":"我的卡被刷两次"},{"role":"assistant","content":"billing"}]}
{"messages":[{"role":"system","content":"把客服工单分流成 urgent / billing / general。"},{"role":"user","content":"网站挂了一小时"},{"role":"assistant","content":"urgent"}]}
{"messages":[{"role":"system","content":"把客服工单分流成 urgent / billing / general。"},{"role":"user","content":"怎么导出数据？"},{"role":"assistant","content":"general"}]}

LoRA 的甜点是 200-2,000 例。<100 欠拟合，>5,000 大部分领域语气任务边际收益递减。

提交 job（Firectl CLI）

# 装 + 登录
pip install fireworks-ai
firectl signin

# 上传数据集
firectl create dataset support-triage --file train.jsonl

# 启动微调
firectl create fine-tuning-job \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --dataset support-triage \
  --output-model my-support-triage-v1 \
  --epochs 3 \
  --learning-rate 0.0001

用微调

resp = client.chat.completions.create(
    model="accounts/<your_account>/models/my-support-triage-v1",
    messages=[{"role": "user", "content": "退款没到账"}],
)
print(resp.choices[0].message.content)  # → "billing"

成本特征（2026 年 5 月）

项目	成本
训练	每百万训练 token ~$0.50
托管推理（已部署 LoRA）	跟底模同价
闲置托管费	$0

微调还是 prompt 工程？

现象	用什么
4-shot prompt 能给对答案	Prompt
需要严丝合缝匹配特定输出格式	微调
领域术语和语气一致	微调
延迟预算装不下 few-shot 例子	微调
训练数据 <50 例	Prompt

FAQ

Q: 训练多久？ A: Llama 8B 上典型 1K 例 LoRA 30-60 分钟。更大数据集或 70B 底模 2-4 小时。Firectl 显示实时进度；可以从 Firectl 或仪表盘看状态。

Q: 能下载微调权重吗？ A: LoRA 适配器可以 —— Firectl 导出 safetensors。底模不可二次分发，但你训的 adapter 归你。后面想在自管 GPU 上跑同一 LoRA 就有用。

Q: 支持全量微调吗（不是 LoRA）？ A: 无服务器档目前只支持 LoRA。全量微调在 Fireworks 专属部署上可用，按小时租 GPU。大多数领域调优 LoRA 就是对的折中。

Fireworks Fine-Tuning — Serverless LoRA on Llama in 30 min

这个资产可以被 Agent 直接读取和安装

简介

准备训练数据（JSONL）

提交 job（Firectl CLI）

用微调

成本特征（2026 年 5 月）

微调还是 prompt 工程？

FAQ

来源与感谢

讨论

相关资产

Fireworks Inference — 100+ Open Models on OpenAI-Compat API

GroqCloud Quickstart — 250 tokens/sec OpenAI-Compat API

SWE-bench — Benchmark for Coding Agents

Weave — Trace and Debug LLM Apps