PromptsApr 8, 2026·2 min read

Cohere Embed — Multilingual AI Embeddings API

Generate high-quality multilingual embeddings for search and RAG. Cohere Embed v3 supports 100+ languages with specialized modes for documents, queries, and classification.

PR
Prompt Lab · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install cohere
import cohere

co = cohere.ClientV2(api_key="...")

# Generate embeddings
response = co.embed(
    texts=["What is machine learning?", "How does AI work?"],
    model="embed-v4.0",
    input_type="search_document",
    embedding_types=["float"],
)

print(len(response.embeddings.float_[0]))  # 1024 dimensions

What is Cohere Embed?

Cohere Embed is a multilingual embedding API that converts text into high-dimensional vectors for semantic search, RAG, and classification. Version 4.0 supports 100+ languages, offers specialized input types (document vs. query), and includes built-in compression for storage efficiency. It consistently ranks among the top embedding models on the MTEB benchmark.

Answer-Ready: Cohere Embed v4.0 generates multilingual embeddings for search and RAG. Top MTEB benchmark scores, 100+ languages, specialized input types (document/query/classification). Binary and int8 compression for 32x storage savings. Production API with generous free tier.

Best for: Teams building multilingual search or RAG pipelines. Works with: Any vector database, LangChain, LlamaIndex. Setup time: Under 2 minutes.

Core Features

1. Input Types

# Different modes optimize for different tasks
docs = co.embed(texts=[...], input_type="search_document")    # For indexing
queries = co.embed(texts=[...], input_type="search_query")     # For searching
classify = co.embed(texts=[...], input_type="classification")  # For classification
cluster = co.embed(texts=[...], input_type="clustering")       # For clustering

2. Compression (32x Savings)

response = co.embed(
    texts=["Hello world"],
    model="embed-v4.0",
    input_type="search_document",
    embedding_types=["float", "int8", "ubinary"],
)

# float: 1024 x 4 bytes = 4KB per vector
# int8:  1024 x 1 byte  = 1KB per vector (4x savings)
# binary: 1024 / 8 bytes = 128B per vector (32x savings)

3. Multilingual (100+ Languages)

# Same model handles all languages — no separate models needed
texts = [
    "What is AI?",           # English
    "AI 是什么?",            # Chinese
    "AIとは何ですか?",       # Japanese
    "Was ist KI?",           # German
]
response = co.embed(texts=texts, model="embed-v4.0", input_type="search_document")
# Cross-lingual similarity works automatically

4. Batch Processing

# Embed up to 96 texts per request
all_embeddings = []
for batch in chunks(documents, 96):
    response = co.embed(texts=batch, model="embed-v4.0", input_type="search_document")
    all_embeddings.extend(response.embeddings.float_)

Cohere Embed vs Alternatives

Model Dimensions Languages MTEB Score Compression
Cohere Embed v4.0 1024 100+ Top 3 float/int8/binary
OpenAI text-embedding-3-large 3072 50+ Top 5 Matryoshka
Voyage AI v3 1024 20+ Top 5 No
BGE-M3 (open source) 1024 100+ Good No

Pricing

Tier Embeddings/mo Price
Free 1M $0
Production Pay-as-you-go $0.1/M tokens

FAQ

Q: How does it compare to OpenAI embeddings? A: Comparable quality on MTEB, better multilingual support, and built-in binary compression for significant storage savings.

Q: Can I use it with Pinecone/Qdrant/Weaviate? A: Yes, generate embeddings with Cohere and store in any vector database.

Q: Is there an open-source alternative? A: BGE-M3 and E5-Mistral are strong open-source options, but require self-hosting.

🙏

Source & Thanks

Created by Cohere.

cohere.com/embed — Multilingual embedding API

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets