KnowledgeMay 8, 2026·4 min read

xAI Grok API Quickstart — OpenAI-Compatible Frontier Model

xAI Grok API is OpenAI-compatible at api.x.ai/v1. Swap base URL + key, keep the SDK. Grok-3, Grok-2 Vision, 1M-token context.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 15/100Stage only
Agent surface
Any MCP/CLI agent
Kind
Knowledge
Install
Stage only
Trust
Trust: New
Entrypoint
Asset
Universal CLI install command
npx tokrepo install 1b2352a4-5be8-40c7-8b77-47d1af60b4ea
Intro

xAI's Grok API is OpenAI-compatible — point the OpenAI SDK at api.x.ai/v1 with an XAI_API_KEY, change model='grok-3', and you're running on Grok. Grok-3 is the flagship reasoning model with 1M-token context; Grok-2 Vision handles images; Live Search gives real-time web results inside the call. Best for: apps that need fresh real-time knowledge (news, prices, sports) without a separate retrieval layer; very long context tasks (>200K tokens). Works with: openai-python, openai-node, LangChain, LlamaIndex via OpenAI-compatible adapter. Setup time: 2 minutes.


Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.x.ai/v1",
    api_key=os.environ["XAI_API_KEY"],
)

resp = client.chat.completions.create(
    model="grok-3",
    messages=[{"role": "user", "content": "What's the latest on the SpaceX launch this week?"}],
)
print(resp.choices[0].message.content)

Vision with Grok-2

resp = client.chat.completions.create(
    model="grok-2-vision-latest",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's wrong with this UI screenshot?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}},
        ],
    }],
)

Live Search (real-time web grounding)

resp = client.chat.completions.create(
    model="grok-3",
    messages=[{"role": "user", "content": "What's BTC price right now and the top 3 reasons it moved today?"}],
    extra_body={
        "search_parameters": {
            "mode": "on",          # off | auto | on
            "sources": [{"type": "web"}, {"type": "x"}, {"type": "news"}],
            "max_search_results": 8,
        }
    },
)
print(resp.choices[0].message.content)
print(resp.usage.num_sources_used)  # how many sources Grok grounded against

Model lineup

Model ID Context Best for
grok-3 1,000,000 Long-context reasoning, complex agents
grok-3-mini 131,072 Fast cheap reasoning
grok-2-vision-latest 32,768 Image understanding
grok-2-image-latest n/a Image generation

Pricing (per 1M tokens, May 2026)

  • Grok-3: $5 input / $15 output
  • Grok-3-mini: $0.30 input / $0.50 output

Migration from OpenAI

The only changes: base_url and the model string. Tools, vision, JSON mode, streaming — all work identically.


FAQ

Q: How does Grok 1M context compare to Gemini 2M? A: Both work for full-corpus tasks. Grok-3's 1M is denser per dollar at $5/M input. Gemini 2.5 Pro is ~$1.25/M but has stricter rate limits. For 100K–500K typical jobs, Grok is faster end-to-end; for >800K, both work.

Q: What sources does Live Search index? A: Web (search engine), X (Twitter posts), News (press articles), and RSS. You whitelist via the sources array. Grok returns inline citations and a num_sources_used count for verification.

Q: Is the OpenAI-compat layer feature-complete? A: Mostly — chat.completions, streaming, tools, vision, JSON mode all work. Audio (TTS / Whisper) is not on the xAI API; use OpenAI for those. Embeddings: not yet on xAI as of May 2026.


Quick Use

  1. Get key at console.x.ai
  2. OpenAI(base_url='https://api.x.ai/v1', api_key=XAI_KEY)
  3. Set model='grok-3' — done

Intro

xAI's Grok API is OpenAI-compatible — point the OpenAI SDK at api.x.ai/v1 with an XAI_API_KEY, change model='grok-3', and you're running on Grok. Grok-3 is the flagship reasoning model with 1M-token context; Grok-2 Vision handles images; Live Search gives real-time web results inside the call. Best for: apps that need fresh real-time knowledge (news, prices, sports) without a separate retrieval layer; very long context tasks (>200K tokens). Works with: openai-python, openai-node, LangChain, LlamaIndex via OpenAI-compatible adapter. Setup time: 2 minutes.


Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.x.ai/v1",
    api_key=os.environ["XAI_API_KEY"],
)

resp = client.chat.completions.create(
    model="grok-3",
    messages=[{"role": "user", "content": "What's the latest on the SpaceX launch this week?"}],
)
print(resp.choices[0].message.content)

Vision with Grok-2

resp = client.chat.completions.create(
    model="grok-2-vision-latest",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's wrong with this UI screenshot?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}},
        ],
    }],
)

Live Search (real-time web grounding)

resp = client.chat.completions.create(
    model="grok-3",
    messages=[{"role": "user", "content": "What's BTC price right now and the top 3 reasons it moved today?"}],
    extra_body={
        "search_parameters": {
            "mode": "on",          # off | auto | on
            "sources": [{"type": "web"}, {"type": "x"}, {"type": "news"}],
            "max_search_results": 8,
        }
    },
)
print(resp.choices[0].message.content)
print(resp.usage.num_sources_used)  # how many sources Grok grounded against

Model lineup

Model ID Context Best for
grok-3 1,000,000 Long-context reasoning, complex agents
grok-3-mini 131,072 Fast cheap reasoning
grok-2-vision-latest 32,768 Image understanding
grok-2-image-latest n/a Image generation

Pricing (per 1M tokens, May 2026)

  • Grok-3: $5 input / $15 output
  • Grok-3-mini: $0.30 input / $0.50 output

Migration from OpenAI

The only changes: base_url and the model string. Tools, vision, JSON mode, streaming — all work identically.


FAQ

Q: How does Grok 1M context compare to Gemini 2M? A: Both work for full-corpus tasks. Grok-3's 1M is denser per dollar at $5/M input. Gemini 2.5 Pro is ~$1.25/M but has stricter rate limits. For 100K–500K typical jobs, Grok is faster end-to-end; for >800K, both work.

Q: What sources does Live Search index? A: Web (search engine), X (Twitter posts), News (press articles), and RSS. You whitelist via the sources array. Grok returns inline citations and a num_sources_used count for verification.

Q: Is the OpenAI-compat layer feature-complete? A: Mostly — chat.completions, streaming, tools, vision, JSON mode all work. Audio (TTS / Whisper) is not on the xAI API; use OpenAI for those. Embeddings: not yet on xAI as of May 2026.


Source & Thanks

Built by xAI. API docs at docs.x.ai.

Public SDK: xai-org

🙏

Source & Thanks

Built by xAI. API docs at docs.x.ai.

Public SDK: xai-org

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets