Quick Use
- Sign up at dashboard.cohere.com → copy API key
pip install cohere(ornpm install cohere-ai)co.rerank(model='rerank-v3.5', query=..., documents=[...], top_n=5)
Intro
Cohere Rerank is the cross-encoder reranking layer that sits between your vector search and your LLM. Take the top 50-100 candidates from a vector search, pass them through Rerank-3, get back the top 5-10 most relevant. Boosts top-1 hit rate by 30-50% on real RAG benchmarks. Best for: any RAG pipeline where retrieval quality is the bottleneck. Works with: Cohere REST API, Python / TypeScript SDK, AWS Bedrock, Azure. Setup time: 2 minutes.
Drop-in rerank
import cohere
co = cohere.Client(os.environ["COHERE_API_KEY"])
# 1. Vector search returns 50 candidates
candidates = vector_db.query(query="What is RAG?", top_k=50)
docs = [c.text for c in candidates]
# 2. Rerank to top 5
response = co.rerank(
model="rerank-v3.5",
query="What is RAG?",
documents=docs,
top_n=5,
)
for r in response.results:
print(f"score={r.relevance_score:.3f} text={docs[r.index][:100]}")Multilingual
Rerank-v3.5 ships native multilingual support (100+ languages). Query in English, score documents in Chinese / Spanish / Arabic — works without translation.
response = co.rerank(
model="rerank-v3.5",
query="machine learning libraries",
documents=[
"PyTorch é uma biblioteca de aprendizado de máquina em Python",
"TensorFlow는 Google이 만든 머신러닝 프레임워크입니다",
"TypeScript 是 JavaScript 的超集",
],
top_n=2,
)
# Picks the PT + KO docs, drops the TS oneWhy rerank vs better embeddings
Reranking with a cross-encoder is a different signal than bi-encoder embeddings used for vector search. Embeddings encode each doc independently; rerank conditions doc scoring on the query. The combination (vector search → rerank) consistently beats either alone.
When to skip rerank
- You only have ≤10 candidates and they're already good
- Latency budget < 200ms (rerank adds ~100-200ms for 50 docs)
- Your retrieval is already perfect (rare)
FAQ
Q: Is Cohere Rerank free? A: Free trial credits on signup. After that, $2 per 1,000 search units (one search = one query + up to 100 docs). Pricing on cohere.com/pricing. Bedrock and Azure pricing differs.
Q: How is this different from a smaller LLM doing the rerank? A: A smaller LLM via prompt-based reranking (e.g. 'rate doc 1-10 for relevance') is slower, more expensive, and noisier. Rerank-v3.5 is purpose-trained, returns calibrated scores, and runs ~10× faster than a 7B LLM.
Q: Can I run Rerank locally? A: Cohere's hosted Rerank is API-only. For local rerank, BGE-Reranker (open-source, runs on Ollama) is the closest equivalent — slightly lower accuracy on English, comparable on multilingual.
Source & Thanks
Built by Cohere. Commercial product with free trial.
docs.cohere.com/rerank — Rerank documentation