PromptsApr 7, 2026·2 min read

Granite Code — IBM Open Source AI Coding Models

IBM's open-source code LLMs from 3B to 34B parameters. Trained on 116 programming languages with Apache 2.0 license. Top performer on code benchmarks.

TL;DR
IBM open-source code models spanning 3B to 34B parameters, trained on 116 languages under Apache 2.0.
§01

What it is

Granite Code is IBM's family of open-source large language models purpose-built for code tasks. The model lineup ranges from 3B to 34B parameters, covering code generation, code explanation, bug detection, and completion across 116 programming languages. All models ship under the Apache 2.0 license, making them freely usable in commercial products.

Developers working on code assistance tooling, IDE integrations, or offline AI coding setups benefit most from Granite Code. The smaller 3B and 8B variants run on consumer hardware, while the 34B model targets teams with GPU infrastructure who need higher accuracy.

§02

How it saves time or tokens

Granite Code reduces token costs by offering locally-runnable models that skip API billing entirely. Running the 8B variant on a single GPU eliminates per-token charges from hosted APIs. For teams processing large codebases, this translates to significant savings on repetitive tasks like docstring generation, test scaffolding, and code review assistance.

The smaller models also respond faster than round-tripping to cloud APIs, cutting latency from seconds to milliseconds on local hardware.

§03

How to use

  1. Download the desired model size from Hugging Face (IBM publishes all variants under ibm-granite/granite-code-*)
  2. Load the model using transformers or serve it through Ollama or vLLM for API-compatible access
  3. Send code prompts in the format the model expects (instruction-tuned variants accept chat-style messages)
§04

Example

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = 'ibm-granite/granite-8b-code-instruct'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = 'Write a Python function that merges two sorted lists.'
inputs = tokenizer(prompt, return_tensors='pt')
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
§05

Related on TokRepo

  • Local LLM tools — Compare Granite Code with other locally-runnable model setups
  • AI coding tools — Browse coding assistants and frameworks on TokRepo
§06

Common pitfalls

  • The base models are not instruction-tuned; use the -instruct variants for chat-style prompts
  • The 34B model requires at least 24GB VRAM; quantized versions (GPTQ/GGUF) reduce this requirement
  • Context window is 8K tokens for most variants, which limits processing very large files in a single pass

Frequently Asked Questions

What programming languages does Granite Code support?+

Granite Code is trained on 116 programming languages. The training data covers mainstream languages like Python, Java, JavaScript, C++, Go, and Rust, as well as less common ones. Performance is strongest on languages with the most training data.

Can I run Granite Code on a laptop without a GPU?+

The 3B variant can run on CPU with quantization (GGUF format via llama.cpp or Ollama), though inference is slower. The 8B model benefits from a dedicated GPU. The 34B model requires server-grade hardware.

How does Granite Code compare to Code Llama?+

Both are open-source code LLMs. Granite Code focuses on enterprise use cases with Apache 2.0 licensing and IBM support. Code Llama uses a custom Meta license with some commercial restrictions. Benchmark performance varies by task and model size.

Is Granite Code free for commercial use?+

Yes. All Granite Code models are released under the Apache 2.0 license, which permits commercial use, modification, and redistribution without royalties or usage fees.

What context window does Granite Code support?+

Most Granite Code variants support an 8K token context window. Some newer releases extend this. Check the specific model card on Hugging Face for the exact context length of each variant.

Citations (3)
🙏

Source & Thanks

Created by IBM Research. Licensed under Apache 2.0.

ibm-granite/granite-code-models — 2k+ stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.