# Cohere Rerank — Boost RAG Accuracy with Rerank-3 > Cohere Rerank scores candidates against a query using a cross-encoder. Drop into any RAG to boost top-1 hit rate by 30-50% over vector search alone. ## Install Copy the content below into your project: ## Quick Use 1. Sign up at dashboard.cohere.com → copy API key 2. `pip install cohere` (or `npm install cohere-ai`) 3. `co.rerank(model='rerank-v3.5', query=..., documents=[...], top_n=5)` --- ## Intro Cohere Rerank is the cross-encoder reranking layer that sits between your vector search and your LLM. Take the top 50-100 candidates from a vector search, pass them through Rerank-3, get back the top 5-10 most relevant. Boosts top-1 hit rate by 30-50% on real RAG benchmarks. Best for: any RAG pipeline where retrieval quality is the bottleneck. Works with: Cohere REST API, Python / TypeScript SDK, AWS Bedrock, Azure. Setup time: 2 minutes. --- ### Drop-in rerank ```python import cohere co = cohere.Client(os.environ["COHERE_API_KEY"]) # 1. Vector search returns 50 candidates candidates = vector_db.query(query="What is RAG?", top_k=50) docs = [c.text for c in candidates] # 2. Rerank to top 5 response = co.rerank( model="rerank-v3.5", query="What is RAG?", documents=docs, top_n=5, ) for r in response.results: print(f"score={r.relevance_score:.3f} text={docs[r.index][:100]}") ``` ### Multilingual Rerank-v3.5 ships native multilingual support (100+ languages). Query in English, score documents in Chinese / Spanish / Arabic — works without translation. ```python response = co.rerank( model="rerank-v3.5", query="machine learning libraries", documents=[ "PyTorch é uma biblioteca de aprendizado de máquina em Python", "TensorFlow는 Google이 만든 머신러닝 프레임워크입니다", "TypeScript 是 JavaScript 的超集", ], top_n=2, ) # Picks the PT + KO docs, drops the TS one ``` ### Why rerank vs better embeddings Reranking with a cross-encoder is a different signal than bi-encoder embeddings used for vector search. Embeddings encode each doc independently; rerank conditions doc scoring on the query. The combination (vector search → rerank) consistently beats either alone. ### When to skip rerank - You only have ≤10 candidates and they're already good - Latency budget < 200ms (rerank adds ~100-200ms for 50 docs) - Your retrieval is already perfect (rare) --- ### FAQ **Q: Is Cohere Rerank free?** A: Free trial credits on signup. After that, $2 per 1,000 search units (one search = one query + up to 100 docs). Pricing on cohere.com/pricing. Bedrock and Azure pricing differs. **Q: How is this different from a smaller LLM doing the rerank?** A: A smaller LLM via prompt-based reranking (e.g. 'rate doc 1-10 for relevance') is slower, more expensive, and noisier. Rerank-v3.5 is purpose-trained, returns calibrated scores, and runs ~10× faster than a 7B LLM. **Q: Can I run Rerank locally?** A: Cohere's hosted Rerank is API-only. For local rerank, BGE-Reranker (open-source, runs on Ollama) is the closest equivalent — slightly lower accuracy on English, comparable on multilingual. --- ## Source & Thanks > Built by [Cohere](https://github.com/cohere-ai). Commercial product with free trial. > > [docs.cohere.com/rerank](https://docs.cohere.com/docs/rerank-overview) — Rerank documentation --- ## 快速使用 1. 在 dashboard.cohere.com 注册,复制 API key 2. `pip install cohere`(或 `npm install cohere-ai`) 3. `co.rerank(model='rerank-v3.5', query=..., documents=[...], top_n=5)` --- ## 简介 Cohere Rerank 是介于向量搜索和 LLM 之间的 cross-encoder 重排层。从向量搜索拿 top 50-100 候选,过 Rerank-3,拿回 top 5-10 最相关。在真实 RAG benchmark 上把 top-1 命中率提 30-50%。适合任何检索质量是瓶颈的 RAG 流水线。兼容 Cohere REST API、Python / TypeScript SDK、AWS Bedrock、Azure。装机时间 2 分钟。 --- ### Drop-in 重排 ```python import cohere co = cohere.Client(os.environ["COHERE_API_KEY"]) # 1. 向量搜索返回 50 个候选 candidates = vector_db.query(query="What is RAG?", top_k=50) docs = [c.text for c in candidates] # 2. 重排到 top 5 response = co.rerank( model="rerank-v3.5", query="What is RAG?", documents=docs, top_n=5, ) for r in response.results: print(f"score={r.relevance_score:.3f} text={docs[r.index][:100]}") ``` ### 多语言 Rerank-v3.5 自带多语言支持(100+ 种)。英文 query 给中文 / 西语 / 阿拉伯语文档打分 —— 不用翻译。 ```python response = co.rerank( model="rerank-v3.5", query="machine learning libraries", documents=[ "PyTorch é uma biblioteca de aprendizado de máquina em Python", "TensorFlow는 Google이 만든 머신러닝 프레임워크입니다", "TypeScript 是 JavaScript 的超集", ], top_n=2, ) # 选 PT + KO 文档,丢掉 TS 那个 ``` ### 为啥重排而不是更好的 embedding cross-encoder 重排的信号跟向量搜索用的 bi-encoder embedding 不同。Embedding 独立编码每个文档;重排基于 query 条件给文档打分。两者结合(向量搜索 → 重排)一致地胜过单用任一个。 ### 什么时候不用重排 - 候选只有 ≤10 个且已经够好 - 延迟预算 < 200ms(50 个文档重排加 100-200ms) - 检索已经完美(少见) --- ### FAQ **Q: Cohere Rerank 免费吗?** A: 注册送试用 credit。之后每 1000 search unit $2(一次 search = 一个 query + 最多 100 个文档)。价格见 cohere.com/pricing。Bedrock 和 Azure 价格不同。 **Q: 跟用小 LLM 做重排啥区别?** A: 用小 LLM 做基于 prompt 的重排(比如「给文档 1-10 评相关性」)更慢、更贵、噪声更大。Rerank-v3.5 专门训练、返回校准分数、跑得比 7B LLM 快约 10 倍。 **Q: 能本地跑 Rerank 吗?** A: Cohere 托管 Rerank 只 API。本地重排 BGE-Reranker(开源,能在 Ollama 上跑)是最接近的等价物 —— 英文准度略低,多语言相当。 --- ## 来源与感谢 > Built by [Cohere](https://github.com/cohere-ai). Commercial product with free trial. > > [docs.cohere.com/rerank](https://docs.cohere.com/docs/rerank-overview) — Rerank documentation --- Source: https://tokrepo.com/en/workflows/cohere-rerank-boost-rag-accuracy-with-rerank-3 Author: Cohere