WorkflowsMay 8, 2026·4 min read

Cohere Rerank — Boost RAG Accuracy with Rerank-3

Cohere Rerank scores candidates against a query using a cross-encoder. Drop into any RAG to boost top-1 hit rate by 30-50% over vector search alone.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 17/100Stage only
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Stage only
Trust
Trust: New
Entrypoint
Asset
Universal CLI install command
npx tokrepo install bf323939-d2b6-4426-aa9f-9325666e7eaa
Intro

Cohere Rerank is the cross-encoder reranking layer that sits between your vector search and your LLM. Take the top 50-100 candidates from a vector search, pass them through Rerank-3, get back the top 5-10 most relevant. Boosts top-1 hit rate by 30-50% on real RAG benchmarks. Best for: any RAG pipeline where retrieval quality is the bottleneck. Works with: Cohere REST API, Python / TypeScript SDK, AWS Bedrock, Azure. Setup time: 2 minutes.


Drop-in rerank

import cohere

co = cohere.Client(os.environ["COHERE_API_KEY"])

# 1. Vector search returns 50 candidates
candidates = vector_db.query(query="What is RAG?", top_k=50)
docs = [c.text for c in candidates]

# 2. Rerank to top 5
response = co.rerank(
    model="rerank-v3.5",
    query="What is RAG?",
    documents=docs,
    top_n=5,
)

for r in response.results:
    print(f"score={r.relevance_score:.3f}  text={docs[r.index][:100]}")

Multilingual

Rerank-v3.5 ships native multilingual support (100+ languages). Query in English, score documents in Chinese / Spanish / Arabic — works without translation.

response = co.rerank(
    model="rerank-v3.5",
    query="machine learning libraries",
    documents=[
        "PyTorch é uma biblioteca de aprendizado de máquina em Python",
        "TensorFlow는 Google이 만든 머신러닝 프레임워크입니다",
        "TypeScript 是 JavaScript 的超集",
    ],
    top_n=2,
)
# Picks the PT + KO docs, drops the TS one

Why rerank vs better embeddings

Reranking with a cross-encoder is a different signal than bi-encoder embeddings used for vector search. Embeddings encode each doc independently; rerank conditions doc scoring on the query. The combination (vector search → rerank) consistently beats either alone.

When to skip rerank

  • You only have ≤10 candidates and they're already good
  • Latency budget < 200ms (rerank adds ~100-200ms for 50 docs)
  • Your retrieval is already perfect (rare)

FAQ

Q: Is Cohere Rerank free? A: Free trial credits on signup. After that, $2 per 1,000 search units (one search = one query + up to 100 docs). Pricing on cohere.com/pricing. Bedrock and Azure pricing differs.

Q: How is this different from a smaller LLM doing the rerank? A: A smaller LLM via prompt-based reranking (e.g. 'rate doc 1-10 for relevance') is slower, more expensive, and noisier. Rerank-v3.5 is purpose-trained, returns calibrated scores, and runs ~10× faster than a 7B LLM.

Q: Can I run Rerank locally? A: Cohere's hosted Rerank is API-only. For local rerank, BGE-Reranker (open-source, runs on Ollama) is the closest equivalent — slightly lower accuracy on English, comparable on multilingual.


Quick Use

  1. Sign up at dashboard.cohere.com → copy API key
  2. pip install cohere (or npm install cohere-ai)
  3. co.rerank(model='rerank-v3.5', query=..., documents=[...], top_n=5)

Intro

Cohere Rerank is the cross-encoder reranking layer that sits between your vector search and your LLM. Take the top 50-100 candidates from a vector search, pass them through Rerank-3, get back the top 5-10 most relevant. Boosts top-1 hit rate by 30-50% on real RAG benchmarks. Best for: any RAG pipeline where retrieval quality is the bottleneck. Works with: Cohere REST API, Python / TypeScript SDK, AWS Bedrock, Azure. Setup time: 2 minutes.


Drop-in rerank

import cohere

co = cohere.Client(os.environ["COHERE_API_KEY"])

# 1. Vector search returns 50 candidates
candidates = vector_db.query(query="What is RAG?", top_k=50)
docs = [c.text for c in candidates]

# 2. Rerank to top 5
response = co.rerank(
    model="rerank-v3.5",
    query="What is RAG?",
    documents=docs,
    top_n=5,
)

for r in response.results:
    print(f"score={r.relevance_score:.3f}  text={docs[r.index][:100]}")

Multilingual

Rerank-v3.5 ships native multilingual support (100+ languages). Query in English, score documents in Chinese / Spanish / Arabic — works without translation.

response = co.rerank(
    model="rerank-v3.5",
    query="machine learning libraries",
    documents=[
        "PyTorch é uma biblioteca de aprendizado de máquina em Python",
        "TensorFlow는 Google이 만든 머신러닝 프레임워크입니다",
        "TypeScript 是 JavaScript 的超集",
    ],
    top_n=2,
)
# Picks the PT + KO docs, drops the TS one

Why rerank vs better embeddings

Reranking with a cross-encoder is a different signal than bi-encoder embeddings used for vector search. Embeddings encode each doc independently; rerank conditions doc scoring on the query. The combination (vector search → rerank) consistently beats either alone.

When to skip rerank

  • You only have ≤10 candidates and they're already good
  • Latency budget < 200ms (rerank adds ~100-200ms for 50 docs)
  • Your retrieval is already perfect (rare)

FAQ

Q: Is Cohere Rerank free? A: Free trial credits on signup. After that, $2 per 1,000 search units (one search = one query + up to 100 docs). Pricing on cohere.com/pricing. Bedrock and Azure pricing differs.

Q: How is this different from a smaller LLM doing the rerank? A: A smaller LLM via prompt-based reranking (e.g. 'rate doc 1-10 for relevance') is slower, more expensive, and noisier. Rerank-v3.5 is purpose-trained, returns calibrated scores, and runs ~10× faster than a 7B LLM.

Q: Can I run Rerank locally? A: Cohere's hosted Rerank is API-only. For local rerank, BGE-Reranker (open-source, runs on Ollama) is the closest equivalent — slightly lower accuracy on English, comparable on multilingual.


Source & Thanks

Built by Cohere. Commercial product with free trial.

docs.cohere.com/rerank — Rerank documentation

🙏

Source & Thanks

Built by Cohere. Commercial product with free trial.

docs.cohere.com/rerank — Rerank documentation

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets