# Cohere Rerank — Boost RAG Accuracy with Rerank-3

> Cohere Rerank scores candidates against a query using a cross-encoder. Drop into any RAG to boost top-1 hit rate by 30-50% over vector search alone.

## Install

Copy the content below into your project:

## Quick Use

1. Sign up at dashboard.cohere.com → copy API key
2. `pip install cohere` (or `npm install cohere-ai`)
3. `co.rerank(model='rerank-v3.5', query=..., documents=[...], top_n=5)`

---

## Intro

Cohere Rerank is the cross-encoder reranking layer that sits between your vector search and your LLM. Take the top 50-100 candidates from a vector search, pass them through Rerank-3, get back the top 5-10 most relevant. Boosts top-1 hit rate by 30-50% on real RAG benchmarks. Best for: any RAG pipeline where retrieval quality is the bottleneck. Works with: Cohere REST API, Python / TypeScript SDK, AWS Bedrock, Azure. Setup time: 2 minutes.

---

### Drop-in rerank

```python
import cohere

co = cohere.Client(os.environ["COHERE_API_KEY"])

# 1. Vector search returns 50 candidates
candidates = vector_db.query(query="What is RAG?", top_k=50)
docs = [c.text for c in candidates]

# 2. Rerank to top 5
response = co.rerank(
    model="rerank-v3.5",
    query="What is RAG?",
    documents=docs,
    top_n=5,
)

for r in response.results:
    print(f"score={r.relevance_score:.3f}  text={docs[r.index][:100]}")
```

### Multilingual

Rerank-v3.5 ships native multilingual support (100+ languages). Query in English, score documents in Chinese / Spanish / Arabic — works without translation.

```python
response = co.rerank(
    model="rerank-v3.5",
    query="machine learning libraries",
    documents=[
        "PyTorch é uma biblioteca de aprendizado de máquina em Python",
        "TensorFlow는 Google이 만든 머신러닝 프레임워크입니다",
        "TypeScript 是 JavaScript 的超集",
    ],
    top_n=2,
)
# Picks the PT + KO docs, drops the TS one
```

### Why rerank vs better embeddings

Reranking with a cross-encoder is a different signal than bi-encoder embeddings used for vector search. Embeddings encode each doc independently; rerank conditions doc scoring on the query. The combination (vector search → rerank) consistently beats either alone.

### When to skip rerank

- You only have ≤10 candidates and they're already good
- Latency budget < 200ms (rerank adds ~100-200ms for 50 docs)
- Your retrieval is already perfect (rare)

---

### FAQ

**Q: Is Cohere Rerank free?**
A: Free trial credits on signup. After that, $2 per 1,000 search units (one search = one query + up to 100 docs). Pricing on cohere.com/pricing. Bedrock and Azure pricing differs.

**Q: How is this different from a smaller LLM doing the rerank?**
A: A smaller LLM via prompt-based reranking (e.g. 'rate doc 1-10 for relevance') is slower, more expensive, and noisier. Rerank-v3.5 is purpose-trained, returns calibrated scores, and runs ~10× faster than a 7B LLM.

**Q: Can I run Rerank locally?**
A: Cohere's hosted Rerank is API-only. For local rerank, BGE-Reranker (open-source, runs on Ollama) is the closest equivalent — slightly lower accuracy on English, comparable on multilingual.

---

## Source & Thanks

> Built by [Cohere](https://github.com/cohere-ai). Commercial product with free trial.
>
> [docs.cohere.com/rerank](https://docs.cohere.com/docs/rerank-overview) — Rerank documentation

---

<!-- ZH -->

## 快速使用

1. 在 dashboard.cohere.com 注册，复制 API key
2. `pip install cohere`（或 `npm install cohere-ai`）
3. `co.rerank(model='rerank-v3.5', query=..., documents=[...], top_n=5)`

---

## 简介

Cohere Rerank 是介于向量搜索和 LLM 之间的 cross-encoder 重排层。从向量搜索拿 top 50-100 候选，过 Rerank-3，拿回 top 5-10 最相关。在真实 RAG benchmark 上把 top-1 命中率提 30-50%。适合任何检索质量是瓶颈的 RAG 流水线。兼容 Cohere REST API、Python / TypeScript SDK、AWS Bedrock、Azure。装机时间 2 分钟。

---

### Drop-in 重排

```python
import cohere

co = cohere.Client(os.environ["COHERE_API_KEY"])

# 1. 向量搜索返回 50 个候选
candidates = vector_db.query(query="What is RAG?", top_k=50)
docs = [c.text for c in candidates]

# 2. 重排到 top 5
response = co.rerank(
    model="rerank-v3.5",
    query="What is RAG?",
    documents=docs,
    top_n=5,
)

for r in response.results:
    print(f"score={r.relevance_score:.3f}  text={docs[r.index][:100]}")
```

### 多语言

Rerank-v3.5 自带多语言支持（100+ 种）。英文 query 给中文 / 西语 / 阿拉伯语文档打分 —— 不用翻译。

```python
response = co.rerank(
    model="rerank-v3.5",
    query="machine learning libraries",
    documents=[
        "PyTorch é uma biblioteca de aprendizado de máquina em Python",
        "TensorFlow는 Google이 만든 머신러닝 프레임워크입니다",
        "TypeScript 是 JavaScript 的超集",
    ],
    top_n=2,
)
# 选 PT + KO 文档，丢掉 TS 那个
```

### 为啥重排而不是更好的 embedding

cross-encoder 重排的信号跟向量搜索用的 bi-encoder embedding 不同。Embedding 独立编码每个文档；重排基于 query 条件给文档打分。两者结合（向量搜索 → 重排）一致地胜过单用任一个。

### 什么时候不用重排

- 候选只有 ≤10 个且已经够好
- 延迟预算 < 200ms（50 个文档重排加 100-200ms）
- 检索已经完美（少见）

---

### FAQ

**Q: Cohere Rerank 免费吗？**
A: 注册送试用 credit。之后每 1000 search unit $2（一次 search = 一个 query + 最多 100 个文档）。价格见 cohere.com/pricing。Bedrock 和 Azure 价格不同。

**Q: 跟用小 LLM 做重排啥区别？**
A: 用小 LLM 做基于 prompt 的重排（比如「给文档 1-10 评相关性」）更慢、更贵、噪声更大。Rerank-v3.5 专门训练、返回校准分数、跑得比 7B LLM 快约 10 倍。

**Q: 能本地跑 Rerank 吗？**
A: Cohere 托管 Rerank 只 API。本地重排 BGE-Reranker（开源，能在 Ollama 上跑）是最接近的等价物 —— 英文准度略低，多语言相当。

---

## 来源与感谢

> Built by [Cohere](https://github.com/cohere-ai). Commercial product with free trial.
>
> [docs.cohere.com/rerank](https://docs.cohere.com/docs/rerank-overview) — Rerank documentation


---
Source: https://tokrepo.com/en/workflows/cohere-rerank-boost-rag-accuracy-with-rerank-3
Author: Cohere