What is Cohere Rerank — Boost RAG Accuracy with Rerank-3?

Cohere Rerank scores candidates against a query using a cross-encoder. Drop into any RAG to boost top-1 hit rate by 30-50% over vector search alone.

Is Cohere Rerank — Boost RAG Accuracy with Rerank-3 free to use?

Yes. Cohere Rerank — Boost RAG Accuracy with Rerank-3 is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Cohere Rerank — Boost RAG Accuracy with Rerank-3?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Cohere Rerank — Boost RAG Accuracy with Rerank-3

Name: Cohere Rerank — Boost RAG Accuracy with Rerank-3
Author: Cohere

简介

Cohere Rerank 是介于向量搜索和 LLM 之间的 cross-encoder 重排层。从向量搜索拿 top 50-100 候选，过 Rerank-3，拿回 top 5-10 最相关。在真实 RAG benchmark 上把 top-1 命中率提 30-50%。适合任何检索质量是瓶颈的 RAG 流水线。兼容 Cohere REST API、Python / TypeScript SDK、AWS Bedrock、Azure。装机时间 2 分钟。

Drop-in 重排

import cohere

co = cohere.Client(os.environ["COHERE_API_KEY"])

# 1. 向量搜索返回 50 个候选
candidates = vector_db.query(query="What is RAG?", top_k=50)
docs = [c.text for c in candidates]

# 2. 重排到 top 5
response = co.rerank(
    model="rerank-v3.5",
    query="What is RAG?",
    documents=docs,
    top_n=5,
)

for r in response.results:
    print(f"score={r.relevance_score:.3f}  text={docs[r.index][:100]}")

多语言

Rerank-v3.5 自带多语言支持（100+ 种）。英文 query 给中文 / 西语 / 阿拉伯语文档打分 —— 不用翻译。

response = co.rerank(
    model="rerank-v3.5",
    query="machine learning libraries",
    documents=[
        "PyTorch é uma biblioteca de aprendizado de máquina em Python",
        "TensorFlow는 Google이 만든 머신러닝 프레임워크입니다",
        "TypeScript 是 JavaScript 的超集",
    ],
    top_n=2,
)
# 选 PT + KO 文档，丢掉 TS 那个

为啥重排而不是更好的 embedding

cross-encoder 重排的信号跟向量搜索用的 bi-encoder embedding 不同。Embedding 独立编码每个文档；重排基于 query 条件给文档打分。两者结合（向量搜索 → 重排）一致地胜过单用任一个。

什么时候不用重排

候选只有 ≤10 个且已经够好
延迟预算 < 200ms（50 个文档重排加 100-200ms）
检索已经完美（少见）

FAQ

Q: Cohere Rerank 免费吗？ A: 注册送试用 credit。之后每 1000 search unit $2（一次 search = 一个 query + 最多 100 个文档）。价格见 cohere.com/pricing。Bedrock 和 Azure 价格不同。

Q: 跟用小 LLM 做重排啥区别？ A: 用小 LLM 做基于 prompt 的重排（比如「给文档 1-10 评相关性」）更慢、更贵、噪声更大。Rerank-v3.5 专门训练、返回校准分数、跑得比 7B LLM 快约 10 倍。

Q: 能本地跑 Rerank 吗？ A: Cohere 托管 Rerank 只 API。本地重排 BGE-Reranker（开源，能在 Ollama 上跑）是最接近的等价物 —— 英文准度略低，多语言相当。

Cohere Rerank — Boost RAG Accuracy with Rerank-3

这个资产可以被 Agent 直接读取和安装

简介

Drop-in 重排

多语言

为啥重排而不是更好的 embedding

什么时候不用重排

FAQ

来源与感谢

讨论

相关资产

Pinecone Assistant — Managed RAG Service with Auto-Indexing

Cohere Embed — Multilingual AI Embeddings API

Cohere Command R — Long-Context Tool-Use Model for Agents

Turbopuffer — Serverless Vector DB for AI Search