Cette page est affichée en anglais. Une traduction française est en cours.
WorkflowsMay 8, 2026·4 min de lecture

Cohere Rerank — Boost RAG Accuracy with Rerank-3

Cohere Rerank scores candidates against a query using a cross-encoder. Drop into any RAG to boost top-1 hit rate by 30-50% over vector search alone.

Cohere
Cohere · Community
Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 17/100Stage only
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Stage only
Confiance
Confiance : New
Point d'entrée
Asset
Commande CLI universelle
npx tokrepo install bf323939-d2b6-4426-aa9f-9325666e7eaa
Introduction

Cohere Rerank is the cross-encoder reranking layer that sits between your vector search and your LLM. Take the top 50-100 candidates from a vector search, pass them through Rerank-3, get back the top 5-10 most relevant. Boosts top-1 hit rate by 30-50% on real RAG benchmarks. Best for: any RAG pipeline where retrieval quality is the bottleneck. Works with: Cohere REST API, Python / TypeScript SDK, AWS Bedrock, Azure. Setup time: 2 minutes.


Drop-in rerank

import cohere

co = cohere.Client(os.environ["COHERE_API_KEY"])

# 1. Vector search returns 50 candidates
candidates = vector_db.query(query="What is RAG?", top_k=50)
docs = [c.text for c in candidates]

# 2. Rerank to top 5
response = co.rerank(
    model="rerank-v3.5",
    query="What is RAG?",
    documents=docs,
    top_n=5,
)

for r in response.results:
    print(f"score={r.relevance_score:.3f}  text={docs[r.index][:100]}")

Multilingual

Rerank-v3.5 ships native multilingual support (100+ languages). Query in English, score documents in Chinese / Spanish / Arabic — works without translation.

response = co.rerank(
    model="rerank-v3.5",
    query="machine learning libraries",
    documents=[
        "PyTorch é uma biblioteca de aprendizado de máquina em Python",
        "TensorFlow는 Google이 만든 머신러닝 프레임워크입니다",
        "TypeScript 是 JavaScript 的超集",
    ],
    top_n=2,
)
# Picks the PT + KO docs, drops the TS one

Why rerank vs better embeddings

Reranking with a cross-encoder is a different signal than bi-encoder embeddings used for vector search. Embeddings encode each doc independently; rerank conditions doc scoring on the query. The combination (vector search → rerank) consistently beats either alone.

When to skip rerank

  • You only have ≤10 candidates and they're already good
  • Latency budget < 200ms (rerank adds ~100-200ms for 50 docs)
  • Your retrieval is already perfect (rare)

FAQ

Q: Is Cohere Rerank free? A: Free trial credits on signup. After that, $2 per 1,000 search units (one search = one query + up to 100 docs). Pricing on cohere.com/pricing. Bedrock and Azure pricing differs.

Q: How is this different from a smaller LLM doing the rerank? A: A smaller LLM via prompt-based reranking (e.g. 'rate doc 1-10 for relevance') is slower, more expensive, and noisier. Rerank-v3.5 is purpose-trained, returns calibrated scores, and runs ~10× faster than a 7B LLM.

Q: Can I run Rerank locally? A: Cohere's hosted Rerank is API-only. For local rerank, BGE-Reranker (open-source, runs on Ollama) is the closest equivalent — slightly lower accuracy on English, comparable on multilingual.


Quick Use

  1. Sign up at dashboard.cohere.com → copy API key
  2. pip install cohere (or npm install cohere-ai)
  3. co.rerank(model='rerank-v3.5', query=..., documents=[...], top_n=5)

Intro

Cohere Rerank is the cross-encoder reranking layer that sits between your vector search and your LLM. Take the top 50-100 candidates from a vector search, pass them through Rerank-3, get back the top 5-10 most relevant. Boosts top-1 hit rate by 30-50% on real RAG benchmarks. Best for: any RAG pipeline where retrieval quality is the bottleneck. Works with: Cohere REST API, Python / TypeScript SDK, AWS Bedrock, Azure. Setup time: 2 minutes.


Drop-in rerank

import cohere

co = cohere.Client(os.environ["COHERE_API_KEY"])

# 1. Vector search returns 50 candidates
candidates = vector_db.query(query="What is RAG?", top_k=50)
docs = [c.text for c in candidates]

# 2. Rerank to top 5
response = co.rerank(
    model="rerank-v3.5",
    query="What is RAG?",
    documents=docs,
    top_n=5,
)

for r in response.results:
    print(f"score={r.relevance_score:.3f}  text={docs[r.index][:100]}")

Multilingual

Rerank-v3.5 ships native multilingual support (100+ languages). Query in English, score documents in Chinese / Spanish / Arabic — works without translation.

response = co.rerank(
    model="rerank-v3.5",
    query="machine learning libraries",
    documents=[
        "PyTorch é uma biblioteca de aprendizado de máquina em Python",
        "TensorFlow는 Google이 만든 머신러닝 프레임워크입니다",
        "TypeScript 是 JavaScript 的超集",
    ],
    top_n=2,
)
# Picks the PT + KO docs, drops the TS one

Why rerank vs better embeddings

Reranking with a cross-encoder is a different signal than bi-encoder embeddings used for vector search. Embeddings encode each doc independently; rerank conditions doc scoring on the query. The combination (vector search → rerank) consistently beats either alone.

When to skip rerank

  • You only have ≤10 candidates and they're already good
  • Latency budget < 200ms (rerank adds ~100-200ms for 50 docs)
  • Your retrieval is already perfect (rare)

FAQ

Q: Is Cohere Rerank free? A: Free trial credits on signup. After that, $2 per 1,000 search units (one search = one query + up to 100 docs). Pricing on cohere.com/pricing. Bedrock and Azure pricing differs.

Q: How is this different from a smaller LLM doing the rerank? A: A smaller LLM via prompt-based reranking (e.g. 'rate doc 1-10 for relevance') is slower, more expensive, and noisier. Rerank-v3.5 is purpose-trained, returns calibrated scores, and runs ~10× faster than a 7B LLM.

Q: Can I run Rerank locally? A: Cohere's hosted Rerank is API-only. For local rerank, BGE-Reranker (open-source, runs on Ollama) is the closest equivalent — slightly lower accuracy on English, comparable on multilingual.


Source & Thanks

Built by Cohere. Commercial product with free trial.

docs.cohere.com/rerank — Rerank documentation

🙏

Source et remerciements

Built by Cohere. Commercial product with free trial.

docs.cohere.com/rerank — Rerank documentation

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires