# Fireworks Fine-Tuning — Serverless LoRA on Llama in 30 min > Fireworks runs serverless LoRA fine-tuning on Llama, Qwen, Mixtral. Upload JSONL, get a deployed fine-tune in 30 min on the same endpoint. ## Install Copy the content below into your project: ## Quick Use 1. `pip install fireworks-ai && firectl signin` 2. Prepare JSONL with `{messages: [...]}` per line 3. `firectl create fine-tuning-job --base-model llama-v3p1-8b-instruct --dataset NAME` --- ## Intro Fireworks Fine-Tuning runs serverless LoRA on Llama 3.x, Qwen 2.5, and Mixtral — upload a JSONL training file via the Firectl CLI, wait 30-60 minutes, your fine-tune is deployed at the same OpenAI-compatible endpoint with a new model ID. No GPU rental, no idle hosting fee. Best for: classification heads on top of Llama 8B, instruction-following adapters, domain-tone tuning, distilling GPT-4o behavior into a cheap base model. Works with: any client that hits Fireworks. Setup time: 30 minutes from JSONL to live model. --- ### Prepare training data (JSONL) ```jsonl {"messages":[{"role":"system","content":"Classify support tickets as urgent / billing / general."},{"role":"user","content":"My card was charged twice"},{"role":"assistant","content":"billing"}]} {"messages":[{"role":"system","content":"Classify support tickets as urgent / billing / general."},{"role":"user","content":"Site down for an hour"},{"role":"assistant","content":"urgent"}]} {"messages":[{"role":"system","content":"Classify support tickets as urgent / billing / general."},{"role":"user","content":"How do I export data?"},{"role":"assistant","content":"general"}]} ``` 200-2,000 examples is the sweet spot for LoRA. Below 100 → underfit, above 5,000 → diminishing returns for most domain-tone tasks. ### Submit job (Firectl CLI) ```bash # Install + log in pip install fireworks-ai firectl signin # Upload dataset firectl create dataset support-triage --file train.jsonl # Launch fine-tune firectl create fine-tuning-job \ --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \ --dataset support-triage \ --output-model my-support-triage-v1 \ --epochs 3 \ --learning-rate 0.0001 ``` ### Use the fine-tune ```python resp = client.chat.completions.create( model="accounts//models/my-support-triage-v1", messages=[{"role": "user", "content": "Refund didn't go through"}], ) print(resp.choices[0].message.content) # → "billing" ``` ### Cost characteristics (May 2026) | Item | Cost | |---|---| | Training | ~$0.50 per 1M training tokens | | Hosted inference (deployed LoRA) | Same as base model rate | | Idle hosting fee | $0 | ### When to fine-tune vs prompt-engineer | Symptom | Use | |---|---| | Model gets the right answer with a 4-shot prompt | **Prompt** | | Need to match a specific output format perfectly | **Fine-tune** | | Domain jargon and tone consistency | **Fine-tune** | | Latency budget can't fit few-shot examples in context | **Fine-tune** | | Training data <50 examples | **Prompt** | --- ### FAQ **Q: How long does training take?** A: 30-60 minutes for typical 1K-example LoRA on Llama 8B. Larger datasets or 70B base model can run 2-4 hours. Firectl shows live progress; you can check status from Firectl or the dashboard. **Q: Can I download my fine-tune weights?** A: Yes for LoRA adapters — Firectl exports the safetensors. The base model isn't redistributable but the adapter you trained is yours. Useful if you want to host the same LoRA on a self-managed GPU later. **Q: Does it support full fine-tuning (not LoRA)?** A: Currently LoRA-only on the serverless plan. Full fine-tuning is available on Fireworks dedicated deployments where you rent GPUs hourly. For most domain-tuning tasks LoRA is the right tradeoff. --- ## Source & Thanks > Built by [Fireworks AI](https://github.com/fw-ai). Fine-tuning docs at [docs.fireworks.ai/fine-tuning](https://docs.fireworks.ai/fine-tuning). > > Firectl CLI MIT-licensed. --- ## 快速使用 1. `pip install fireworks-ai && firectl signin` 2. 准备 JSONL,每行一个 `{messages: [...]}` 3. `firectl create fine-tuning-job --base-model llama-v3p1-8b-instruct --dataset NAME` --- ## 简介 Fireworks 微调在 Llama 3.x / Qwen 2.5 / Mixtral 上跑无服务器 LoRA —— Firectl CLI 上传 JSONL 训练文件,等 30-60 分钟,微调结果部署在同一 OpenAI 兼容 endpoint,model ID 不一样。不租 GPU、不付闲置托管费。适合 Llama 8B 之上的分类头、指令跟随适配器、领域语气调优、把 GPT-4o 行为蒸馏到便宜底模。任何打 Fireworks 的客户端都能用。装机时间:从 JSONL 到上线模型 30 分钟。 --- ### 准备训练数据(JSONL) ```jsonl {"messages":[{"role":"system","content":"把客服工单分流成 urgent / billing / general。"},{"role":"user","content":"我的卡被刷两次"},{"role":"assistant","content":"billing"}]} {"messages":[{"role":"system","content":"把客服工单分流成 urgent / billing / general。"},{"role":"user","content":"网站挂了一小时"},{"role":"assistant","content":"urgent"}]} {"messages":[{"role":"system","content":"把客服工单分流成 urgent / billing / general。"},{"role":"user","content":"怎么导出数据?"},{"role":"assistant","content":"general"}]} ``` LoRA 的甜点是 200-2,000 例。<100 欠拟合,>5,000 大部分领域语气任务边际收益递减。 ### 提交 job(Firectl CLI) ```bash # 装 + 登录 pip install fireworks-ai firectl signin # 上传数据集 firectl create dataset support-triage --file train.jsonl # 启动微调 firectl create fine-tuning-job \ --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \ --dataset support-triage \ --output-model my-support-triage-v1 \ --epochs 3 \ --learning-rate 0.0001 ``` ### 用微调 ```python resp = client.chat.completions.create( model="accounts//models/my-support-triage-v1", messages=[{"role": "user", "content": "退款没到账"}], ) print(resp.choices[0].message.content) # → "billing" ``` ### 成本特征(2026 年 5 月) | 项目 | 成本 | |---|---| | 训练 | 每百万训练 token ~$0.50 | | 托管推理(已部署 LoRA)| 跟底模同价 | | 闲置托管费 | $0 | ### 微调还是 prompt 工程? | 现象 | 用什么 | |---|---| | 4-shot prompt 能给对答案 | **Prompt** | | 需要严丝合缝匹配特定输出格式 | **微调** | | 领域术语和语气一致 | **微调** | | 延迟预算装不下 few-shot 例子 | **微调** | | 训练数据 <50 例 | **Prompt** | --- ### FAQ **Q: 训练多久?** A: Llama 8B 上典型 1K 例 LoRA 30-60 分钟。更大数据集或 70B 底模 2-4 小时。Firectl 显示实时进度;可以从 Firectl 或仪表盘看状态。 **Q: 能下载微调权重吗?** A: LoRA 适配器可以 —— Firectl 导出 safetensors。底模不可二次分发,但你训的 adapter 归你。后面想在自管 GPU 上跑同一 LoRA 就有用。 **Q: 支持全量微调吗(不是 LoRA)?** A: 无服务器档目前只支持 LoRA。全量微调在 Fireworks 专属部署上可用,按小时租 GPU。大多数领域调优 LoRA 就是对的折中。 --- ## 来源与感谢 > Built by [Fireworks AI](https://github.com/fw-ai). Fine-tuning docs at [docs.fireworks.ai/fine-tuning](https://docs.fireworks.ai/fine-tuning). > > Firectl CLI MIT-licensed. --- Source: https://tokrepo.com/en/workflows/fireworks-fine-tuning-serverless-lora-on-llama-in-30-min Author: Fireworks AI