# Pinecone Inference — Hosted Embeddings & Reranking API

> Pinecone Inference is a managed embedding + reranking endpoint. Use llama-text-embed-v2 or other models without managing GPU infrastructure.

## Install

Copy the content below into your project:

## Quick Use

1. `pip install pinecone` (≥5.0)
2. `pc = Pinecone(api_key=...)`; call `pc.inference.embed(model=..., inputs=[...])`
3. For RAG, follow with `pc.inference.rerank(...)` on the top candidates

---

## Intro

Pinecone Inference is the hosted embedding + reranking layer that complements Pinecone's vector index. Generate embeddings with llama-text-embed-v2, multilingual-e5, or pluggable third-party models without running your own GPU. Reranking endpoint scores candidate documents with bge-reranker for higher RAG accuracy. Best for: anyone using Pinecone who'd rather not run an embedding service. Works with: Pinecone Python / TypeScript SDK, REST API. Setup time: 2 minutes.

---

### Generate embeddings

```python
from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

embeddings = pc.inference.embed(
    model="llama-text-embed-v2",
    inputs=[
        "Pinecone is a managed vector database",
        "Weaviate is also a vector database",
    ],
    parameters={"input_type": "passage", "truncate": "END"},
)

# Use embeddings directly with a Pinecone index
index = pc.Index("my-index")
index.upsert(vectors=[
    {"id": "doc1", "values": embeddings[0].values, "metadata": {"text": "..."}},
    {"id": "doc2", "values": embeddings[1].values, "metadata": {"text": "..."}},
])
```

### Embed-then-query in one call

```python
# Embed the query
query_emb = pc.inference.embed(
    model="llama-text-embed-v2",
    inputs=["What is a managed vector database?"],
    parameters={"input_type": "query"},
)

# Search the index
results = index.query(
    vector=query_emb[0].values,
    top_k=10,
    include_metadata=True,
)
```

### Rerank candidate documents

```python
reranked = pc.inference.rerank(
    model="bge-reranker-v2-m3",
    query="What is a managed vector database?",
    documents=[r.metadata["text"] for r in results.matches],
    top_n=5,
    return_documents=True,
)

# reranked.data contains the top 5 most relevant, scored 0-1
for r in reranked.data:
    print(r.score, r.document.text)
```

### Why use Inference vs run your own embedding service

- No GPU to manage — Pinecone hosts and scales the model
- Same SDK as the index (no extra auth, billing)
- Inference is included in Pinecone Standard / Enterprise plans
- Latency optimized for use with Pinecone's index (same network)

---

### FAQ

**Q: Is Pinecone Inference free?**
A: There's a free tier (2K embeddings/month). Beyond that it's pay-as-you-go bundled into Pinecone's Standard plan. Free for testing, scales with your index usage.

**Q: Which models are available?**
A: llama-text-embed-v2 (1024-dim), multilingual-e5-large, pinecone-sparse-english-v0, bge-reranker-v2-m3 (reranking). Pinecone adds models periodically — check their docs for the current list.

**Q: Can I use Pinecone Inference without Pinecone the index?**
A: Yes — Inference is a separate API. Generate embeddings, store them anywhere (Postgres pgvector, your own DB). The bundled use case (embed + index in one Pinecone account) is just convenient.

---

## Source & Thanks

> Built by [Pinecone](https://github.com/pinecone-io). Commercial product with free tier.
>
> [docs.pinecone.io/inference](https://docs.pinecone.io/guides/inference) — Inference docs

---

<!-- ZH -->

## 快速使用

1. `pip install pinecone`（≥5.0）
2. `pc = Pinecone(api_key=...)`，调 `pc.inference.embed(model=..., inputs=[...])`
3. RAG 流程后接 `pc.inference.rerank(...)` 给 top 候选打分

---

## 简介

Pinecone Inference 是托管的 embedding + 重排层，跟 Pinecone 向量索引互补。用 llama-text-embed-v2、multilingual-e5、或可插拔第三方模型生成 embedding，不用自己跑 GPU。Reranking 端点用 bge-reranker 给候选文档打分，提高 RAG 准度。适合在用 Pinecone 又不想自己跑 embedding 服务的人。兼容 Pinecone Python / TypeScript SDK + REST API。装机时间 2 分钟。

---

### 生成 embedding

```python
from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

embeddings = pc.inference.embed(
    model="llama-text-embed-v2",
    inputs=[
        "Pinecone is a managed vector database",
        "Weaviate is also a vector database",
    ],
    parameters={"input_type": "passage", "truncate": "END"},
)

# 直接喂 Pinecone 索引
index = pc.Index("my-index")
index.upsert(vectors=[
    {"id": "doc1", "values": embeddings[0].values, "metadata": {"text": "..."}},
    {"id": "doc2", "values": embeddings[1].values, "metadata": {"text": "..."}},
])
```

### 一步完成 embed + 查询

```python
# embed query
query_emb = pc.inference.embed(
    model="llama-text-embed-v2",
    inputs=["What is a managed vector database?"],
    parameters={"input_type": "query"},
)

# 搜索索引
results = index.query(
    vector=query_emb[0].values,
    top_k=10,
    include_metadata=True,
)
```

### 重排候选文档

```python
reranked = pc.inference.rerank(
    model="bge-reranker-v2-m3",
    query="What is a managed vector database?",
    documents=[r.metadata["text"] for r in results.matches],
    top_n=5,
    return_documents=True,
)

# reranked.data 含 top 5 最相关，分数 0-1
for r in reranked.data:
    print(r.score, r.document.text)
```

### 为啥用 Inference 不自己跑 embedding 服务

- 不用管 GPU —— Pinecone 托管 + 扩展模型
- 跟索引一样的 SDK（不用额外鉴权 / 计费）
- Inference 包含在 Pinecone Standard / Enterprise 套餐里
- 延迟为配 Pinecone 索引优化（同一网络）

---

### FAQ

**Q: Pinecone Inference 免费吗？**
A: 有免费档（每月 2K embedding）。超出按量付费打包进 Pinecone Standard 套餐。测试免费，随索引使用量扩展。

**Q: 有哪些模型？**
A: llama-text-embed-v2（1024 维）、multilingual-e5-large、pinecone-sparse-english-v0、bge-reranker-v2-m3（重排）。Pinecone 定期加新模型 —— 看官方 docs 获取最新列表。

**Q: 不用 Pinecone 索引能用 Inference 吗？**
A: 能 —— Inference 是独立 API。生成 embedding 后存任何地方（Postgres pgvector / 你自己的 DB）。捆绑用法（同一个 Pinecone 账号 embed + 索引）只是便利。

---

## 来源与感谢

> Built by [Pinecone](https://github.com/pinecone-io). Commercial product with free tier.
>
> [docs.pinecone.io/inference](https://docs.pinecone.io/guides/inference) — Inference docs


---
Source: https://tokrepo.com/en/workflows/pinecone-inference-hosted-embeddings-reranking-api
Author: Pinecone