What is Cohere Rerank — Boost RAG Accuracy with Rerank-3?

Cohere Rerank scores candidates against a query using a cross-encoder. Drop into any RAG to boost top-1 hit rate by 30-50% over vector search alone.

Is Cohere Rerank — Boost RAG Accuracy with Rerank-3 free to use?

Yes. Cohere Rerank — Boost RAG Accuracy with Rerank-3 is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Cohere Rerank — Boost RAG Accuracy with Rerank-3?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Cohere Rerank — Boost RAG Accuracy with Rerank-3

Name: Cohere Rerank — Boost RAG Accuracy with Rerank-3
Author: Cohere

import cohere co = cohere.Client(os.environ["COHERE_API_KEY"]) # 1. Vector search returns 50 candidates candidates = vector_db.query(query="What is RAG?", top_k=50) docs = [c.text for c in candidates] # 2. Rerank to top 5 response = co.rerank( model="rerank-v3.5", query="What is RAG?", documents=docs, top_n=5, ) for r in response.results: print(f"score={r.relevance_score:.3f} text={docs[r.index][:100]}")

response = co.rerank( model="rerank-v3.5", query="machine learning libraries", documents=[ "PyTorch é uma biblioteca de aprendizado de máquina em Python", "TensorFlow는 Google이 만든 머신러닝 프레임워크입니다", "TypeScript 是 JavaScript 的超集", ], top_n=2, ) # Picks the PT + KO docs, drops the TS one

Quick Use

Sign up at dashboard.cohere.com → copy API key
pip install cohere (or npm install cohere-ai)
co.rerank(model='rerank-v3.5', query=..., documents=[...], top_n=5)

Intro

Cohere Rerank is the cross-encoder reranking layer that sits between your vector search and your LLM. Take the top 50-100 candidates from a vector search, pass them through Rerank-3, get back the top 5-10 most relevant. Boosts top-1 hit rate by 30-50% on real RAG benchmarks. Best for: any RAG pipeline where retrieval quality is the bottleneck. Works with: Cohere REST API, Python / TypeScript SDK, AWS Bedrock, Azure. Setup time: 2 minutes.

Drop-in rerank

import cohere

co = cohere.Client(os.environ["COHERE_API_KEY"])

# 1. Vector search returns 50 candidates
candidates = vector_db.query(query="What is RAG?", top_k=50)
docs = [c.text for c in candidates]

# 2. Rerank to top 5
response = co.rerank(
    model="rerank-v3.5",
    query="What is RAG?",
    documents=docs,
    top_n=5,
)

for r in response.results:
    print(f"score={r.relevance_score:.3f}  text={docs[r.index][:100]}")

Multilingual

Rerank-v3.5 ships native multilingual support (100+ languages). Query in English, score documents in Chinese / Spanish / Arabic — works without translation.

response = co.rerank(
    model="rerank-v3.5",
    query="machine learning libraries",
    documents=[
        "PyTorch é uma biblioteca de aprendizado de máquina em Python",
        "TensorFlow는 Google이 만든 머신러닝 프레임워크입니다",
        "TypeScript 是 JavaScript 的超集",
    ],
    top_n=2,
)
# Picks the PT + KO docs, drops the TS one

Why rerank vs better embeddings

Reranking with a cross-encoder is a different signal than bi-encoder embeddings used for vector search. Embeddings encode each doc independently; rerank conditions doc scoring on the query. The combination (vector search → rerank) consistently beats either alone.

When to skip rerank

You only have ≤10 candidates and they're already good
Latency budget < 200ms (rerank adds ~100-200ms for 50 docs)
Your retrieval is already perfect (rare)

FAQ

Q: Is Cohere Rerank free? A: Free trial credits on signup. After that, $2 per 1,000 search units (one search = one query + up to 100 docs). Pricing on cohere.com/pricing. Bedrock and Azure pricing differs.

Q: How is this different from a smaller LLM doing the rerank? A: A smaller LLM via prompt-based reranking (e.g. 'rate doc 1-10 for relevance') is slower, more expensive, and noisier. Rerank-v3.5 is purpose-trained, returns calibrated scores, and runs ~10× faster than a 7B LLM.

Q: Can I run Rerank locally? A: Cohere's hosted Rerank is API-only. For local rerank, BGE-Reranker (open-source, runs on Ollama) is the closest equivalent — slightly lower accuracy on English, comparable on multilingual.

Source & Thanks

Built by Cohere. Commercial product with free trial.

docs.cohere.com/rerank — Rerank documentation

Cohere Rerank — Boost RAG Accuracy with Rerank-3

Este activo puede ser leído e instalado directamente por agents

Drop-in rerank

Multilingual

Why rerank vs better embeddings

When to skip rerank

FAQ

Quick Use

Intro

Drop-in rerank

Multilingual

Why rerank vs better embeddings

When to skip rerank

FAQ

Source & Thanks

Fuente y agradecimientos

Discusión

Activos relacionados

Pinecone Assistant — Managed RAG Service with Auto-Indexing

Cohere Embed — Multilingual AI Embeddings API

Cohere Command R — Long-Context Tool-Use Model for Agents

Turbopuffer — Serverless Vector DB for AI Search