Granite Code — IBM Open Source AI Coding Models
IBM's open-source code LLMs from 3B to 34B parameters. Trained on 116 programming languages with Apache 2.0 license. Top performer on code benchmarks.
What it is
Granite Code is IBM's family of open-source large language models purpose-built for code tasks. The model lineup ranges from 3B to 34B parameters, covering code generation, code explanation, bug detection, and completion across 116 programming languages. All models ship under the Apache 2.0 license, making them freely usable in commercial products.
Developers working on code assistance tooling, IDE integrations, or offline AI coding setups benefit most from Granite Code. The smaller 3B and 8B variants run on consumer hardware, while the 34B model targets teams with GPU infrastructure who need higher accuracy.
How it saves time or tokens
Granite Code reduces token costs by offering locally-runnable models that skip API billing entirely. Running the 8B variant on a single GPU eliminates per-token charges from hosted APIs. For teams processing large codebases, this translates to significant savings on repetitive tasks like docstring generation, test scaffolding, and code review assistance.
The smaller models also respond faster than round-tripping to cloud APIs, cutting latency from seconds to milliseconds on local hardware.
How to use
- Download the desired model size from Hugging Face (IBM publishes all variants under
ibm-granite/granite-code-*) - Load the model using
transformersor serve it through Ollama or vLLM for API-compatible access - Send code prompts in the format the model expects (instruction-tuned variants accept chat-style messages)
Example
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = 'ibm-granite/granite-8b-code-instruct'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = 'Write a Python function that merges two sorted lists.'
inputs = tokenizer(prompt, return_tensors='pt')
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Related on TokRepo
- Local LLM tools — Compare Granite Code with other locally-runnable model setups
- AI coding tools — Browse coding assistants and frameworks on TokRepo
Common pitfalls
- The base models are not instruction-tuned; use the
-instructvariants for chat-style prompts - The 34B model requires at least 24GB VRAM; quantized versions (GPTQ/GGUF) reduce this requirement
- Context window is 8K tokens for most variants, which limits processing very large files in a single pass
Frequently Asked Questions
Granite Code is trained on 116 programming languages. The training data covers mainstream languages like Python, Java, JavaScript, C++, Go, and Rust, as well as less common ones. Performance is strongest on languages with the most training data.
The 3B variant can run on CPU with quantization (GGUF format via llama.cpp or Ollama), though inference is slower. The 8B model benefits from a dedicated GPU. The 34B model requires server-grade hardware.
Both are open-source code LLMs. Granite Code focuses on enterprise use cases with Apache 2.0 licensing and IBM support. Code Llama uses a custom Meta license with some commercial restrictions. Benchmark performance varies by task and model size.
Yes. All Granite Code models are released under the Apache 2.0 license, which permits commercial use, modification, and redistribution without royalties or usage fees.
Most Granite Code variants support an 8K token context window. Some newer releases extend this. Check the specific model card on Hugging Face for the exact context length of each variant.
Citations (3)
- Granite Code GitHub— IBM Granite Code models trained on 116 programming languages
- Hugging Face Model Card— Apache 2.0 license for all model variants
- IBM Research Blog— Model sizes range from 3B to 34B parameters
Related on TokRepo
Source & Thanks
Created by IBM Research. Licensed under Apache 2.0.
ibm-granite/granite-code-models — 2k+ stars