Indie Game Dev AI Pack
Ten picks for the solo or small-team indie shipping on Unity / Godot / Unreal: Aseprite + Blender MCP for art, AudioCraft for music, Chatterbox + Cartesia for NPC voices, Unity / Unreal / Godot for the engine, a Game Designer agent for mechanics, LibreTranslate for localization — covering art / audio / NPC / level / localize end to end.
What's in this pack
Five layers. Two picks per layer except where one tool already owns the layer:
- Art (concept → 2D sprites → 3D models) — Aseprite for pixel art and frame animation; Blender MCP to drive Blender from a Claude / Cursor agent for 3D modeling, texturing, and rigging via prompts.
- Audio (music + SFX + NPC voice) — AudioCraft (Meta) for music loops and ambient SFX; Chatterbox as the open TTS for cutscene narration; Cartesia Sonic TTS for live in-game NPC voice where the 75ms time-to-first-audio matters.
- Engine bridge (Unity / Unreal / Godot) — Unity MCP and Unreal MCP let agents drive your editor directly (scene manipulation, asset import, script edits); Godot is the open-source engine itself for indies who don't want a license royalty cliff.
- Level + mechanics design — the Game Designer Claude Code agent: a structured prompt + system message that reasons about player loops, balancing, difficulty curves, and economy design instead of just generating filler text.
- Localization — LibreTranslate as a self-hosted MT layer for getting a first-pass translation across 30+ languages before a human polish; pairs with your engine's i18n CSV.
If you're shipping a 10-minute jam game, you don't need this pack — itch.io and your existing toolchain are fine. This pack is for the case where you're trying to actually finish and release a small commercial indie in a year, where every layer below has been a Reddit thread about "how do indies make their own X."
Install in this order
- Aseprite — pixel art + animation. Cheap, owns the workflow for 2D indies. Start here even if your final game is 3D; you'll use it for icons, UI, and concept boards.
- Blender MCP — 3D modeling for agents. Drives Blender from Claude Code / Cursor with natural language. The fastest way for a non-3D-artist solo dev to get serviceable low-poly assets without learning Blender shortcuts first.
- AudioCraft — music + ambient SFX generation. Meta's open model family (MusicGen + AudioGen). Run locally on a 12GB+ GPU; the music is loopable and you can train on your own reference tracks for stylistic consistency.
- Chatterbox — open-source TTS for cutscene narration. Use for one-shot pre-rendered lines in cutscenes, intro/outro narration, codex entries — anywhere you can bake an audio file at build time.
- Cartesia Sonic TTS — low-latency in-game NPC voice. ~75ms time-to-first-audio makes this viable for dynamic NPC dialogue triggered by player action; not the cheapest, but the only one fast enough.
- Unity MCP or Unreal MCP — pick by your engine. Lets your AI agent open scenes, import meshes, attach scripts, and run builds without you clicking through the editor. Saves entire afternoons on repetitive scene setup.
- Godot — the engine itself, if you haven't committed to Unity / Unreal. Open-source, no royalty, lightweight, the indie default in 2026 for new projects without legacy ties.
- Game Designer agent — Claude Code agent specialized in mechanics, balance, and player progression. Use during the design doc phase and at every milestone where you need to ask "is this fun, or just busy?"
- LibreTranslate — self-hosted machine translation for your UI / dialogue strings. Run on your laptop, no API bill, no privacy leak. First-pass MT only; budget a few hundred dollars for human polish per language at launch.
How they fit together
Concept doc + Game Designer agent
│
├──> Art pipeline
│ Aseprite (2D / UI) ──┐
│ Blender MCP (3D) ──┼──> engine assets
│ │
├──> Audio pipeline
│ AudioCraft (music/SFX) ──┐
│ Chatterbox (cutscene) │
│ Cartesia (live NPC) ├──> engine assets
│ │
└──> Engine (Unity / Unreal / Godot via MCP)
│
├──> build & test cycle
└──> i18n CSV ──> LibreTranslate ──> shipped languages
The critical insight: audio is cheap to redo, art is not. Lock the art style first (Aseprite palette, Blender material library), then layer audio generated to fit that art. Don't generate 200 music loops before you know what your game looks like.
Tradeoffs you'll hit
- Commercial license check is non-negotiable — AudioCraft (Meta) is research-licensed in places; Cartesia is paid commercial; Chatterbox is permissive (check current LICENSE before ship). Make a
LICENSES.mdin your repo on day one and update it every time you add a tool. - Style consistency across generated assets — generated sprites from one prompt won't match generated 3D models from another, and neither will match generated music. Lock a reference palette + mood board before generating, and pass it as conditioning where the tool supports it.
- Engine integration is the real cost — getting an AI-generated 3D model into Unity / Unreal with correct scale, pivot, and material slots is half the work. Plan for an import script per asset type, not per asset.
- GPU requirements stack up — AudioCraft (12GB+), Cartesia (cloud), Aseprite (none), Blender (anything), Chatterbox (8GB+). One 4090 covers everything local; below 12GB plan to push audio gen to Replicate.
- "Fully AI indie" is still a meme — every shipped "AI-only" indie at scale has had a human editing every asset. Plan AI as a 5-10x amplifier on you, not a replacement for taste.
Common pitfalls
- Asset style drift — different prompts → different art directions → uncanny game. Fix: write a one-page style guide (palette hex codes, line weight, lighting mood, audio key/tempo) and prepend it to every generation prompt.
- Shipping without checking the LICENSE — "open" models often forbid commercial use or require attribution. Check the LICENSE file in the GitHub repo of every model, not the README. Steam will refund-flag your game if you can't show clean licensing.
- NPC dialogue that's clearly LLM-generated — "I sense your presence, traveler" tells. Fix: write 5-10 personality anchors per NPC (vocabulary they use, things they never say, a verbal tic), and inject them into every dialogue prompt. Bake offline; don't run a frontier model at gameplay time unless you have to.
- Localized strings overflowing UI boxes — German is ~30% longer than English, Russian similar; Japanese / Chinese ~30% shorter but with different line-break rules. Build your UI to auto-resize text containers from day one, not as a launch-week bug fix.
- Music loops that don't loop seamlessly — AudioCraft outputs raw audio; you have to cut the loop point manually in Audacity. Plan a 30-second "intro + loop body + 4-bar outro" structure per track; trying to loop a single 30s blob is where 80% of jam-game audio sounds bad.
10 assets in this pack
Frequently asked questions
Did Midjourney solve the commercial license problem in 2026?
Midjourney Pro / Mega plans grant commercial rights to images you generate, but you still need to check the current ToS yourself before shipping anything to Steam. AudioCraft (Meta) and Stable Diffusion derivatives have their own per-model LICENSE files that override the platform terms — read those. The safe rule: assume nothing is commercial-clean until you've read the LICENSE in the source repo, not the marketing page.
Unity vs Godot — which has more AI tooling in 2026?
Unity has more total tooling because of asset-store momentum (Unity MCP, ML-Agents, dozens of asset-store integrations). Godot has fewer but cleaner AI integrations because the engine itself is open-source and trivial to script against. For a solo indie starting fresh and worried about royalty cliffs, Godot is the safer pick; for a team with Unity muscle memory, Unity + Unity MCP is faster to ship with.
How do I keep NPC dialogue from feeling robotic / out-of-character?
Three things: (1) write 5-10 personality anchors per NPC (specific vocabulary, things they never say, a verbal tic) and inject them into every dialogue prompt; (2) bake dialogue offline at build time rather than generating live at gameplay time — you can curate the outputs; (3) reuse a small set of human-written lines for emotional beats (death, win, betrayal) and let the AI handle filler. The combination is much stronger than any single prompt-engineering trick.
What's the cheapest path to localization for an indie?
Self-host LibreTranslate on your laptop, run your i18n CSV through it for a first-pass translation across 5-10 target languages, then pay a native speaker on Fiverr / Upwork ~$0.05-0.10 per word for QA on the top 3 languages you actually care about. Total cost for a 5000-word indie: ~$250-500 for human polish on 3 langs, $0 for MT pass on the rest. Don't ship Chinese / Japanese / Arabic without a human pass — script direction and grammar break MT-only output in obvious ways.
Can I ship a fully-AI indie game in 2026?
Mechanically yes, commercially questionable. Every shipped "AI-only" indie has had a human editing every asset, writing the design doc, balancing the economy, and fixing the LLM's broken dialogue. AI is realistically a 5-10x amplifier on a solo dev with taste — not a replacement for the dev. Steam's policy as of 2026 requires you to disclose AI-generated content, and players are increasingly skeptical of fully-AI titles. Plan to ship as a human-led indie that used AI heavily, not an AI-led project with a human handler.
12 packs · 80+ hand-picked assets
Browse every curated bundle on the home page
Back to all packs