# Pinecone Assistant — Managed RAG Service with Auto-Indexing

> Pinecone Assistant is the fully managed RAG product on Pinecone. Upload PDFs, query with natural language, get cited answers — no chunking pipeline.

## Install

Copy the content below into your project:

## Quick Use

1. Sign up at app.pinecone.io → copy API key
2. `pip install "pinecone[assistant]"`
3. `pc.assistant.create_assistant(...)`, upload files, call `assistant.chat(messages=...)`

---

## Intro

Pinecone Assistant is the fully managed RAG product — upload PDFs, Word docs, or text, and get a chat endpoint that answers with citations. Pinecone handles chunking, embedding, retrieval, prompt construction, and citation rendering. Best for: teams who want RAG over their docs without building chunking + embedding + prompt-construction layers themselves. Works with: Pinecone Python / Node SDK, REST API, Pinecone Console. Setup time: 5 minutes.

---

### Create an assistant + upload files

```python
from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create
assistant = pc.assistant.create_assistant(
    assistant_name="acme-docs",
    instructions="You are an Acme product support assistant. Cite sources.",
)

# Upload files
assistant.upload_file(file_path="./manual.pdf")
assistant.upload_file(file_path="./faq.md")
assistant.upload_file(file_path="./troubleshooting.docx")
```

Pinecone chunks each document, embeds the chunks, stores them in a hidden vector index, and indexes the metadata.

### Chat with citations

```python
from pinecone_plugins.assistant.models.chat import Message

messages = [Message(role="user", content="How do I reset the device?")]
response = assistant.chat(messages=messages, model="claude-3-5-sonnet")

print(response.message.content)
# "To reset the device, hold the power button for 10 seconds [1]. After the
#  light blinks blue, release. The device will return to factory settings [2]."

for citation in response.citations:
    print(citation.references[0].file.name, citation.references[0].pages)
# manual.pdf [page 12]
# manual.pdf [page 13]
```

### Streaming responses

```python
for chunk in assistant.chat_stream(messages=messages):
    print(chunk.message.content, end="", flush=True)
```

### Filter retrieval by metadata

```python
# Tag files at upload
assistant.upload_file(
    file_path="./internal-only.pdf",
    metadata={"audience": "internal", "version": "2.0"},
)

# Filter at query time
response = assistant.chat(
    messages=messages,
    filter={"audience": {"$eq": "public"}},
)
```

### When to use Assistant vs roll-your-own

| Use Assistant | Roll your own |
|---|---|
| Want RAG working in 1 hour | Need full control of chunking strategy |
| OK with Pinecone's chunking | Specialized doc types (legal, medical) |
| Few hundred MB of docs | TB-scale corpora |
| Need cited answers out of the box | Custom prompt + citation format |

---

### FAQ

**Q: Is Pinecone Assistant free?**
A: There's a free tier (2 assistants, limited queries). Paid plans bundle more queries and storage. Underlying LLM (Claude / GPT) costs are billed by Pinecone with a small markup over direct usage.

**Q: Which LLMs can the Assistant use?**
A: GPT-4o, Claude 3.5 Sonnet, and other models Pinecone keeps adding. You pick at chat time via `model=`. Pinecone handles the API key + routing.

**Q: How does this differ from a custom RAG with Pinecone Index?**
A: Custom RAG: you build chunking, embedding, retrieval, prompt construction, citations. Assistant: Pinecone builds them and exposes a single `chat()` endpoint. For 80% of use cases, Assistant is faster to ship; for the long tail of custom needs, build it yourself.

---

## Source & Thanks

> Built by [Pinecone](https://github.com/pinecone-io). Commercial product with free tier.
>
> [docs.pinecone.io/assistant](https://docs.pinecone.io/guides/assistant) — Assistant docs

---

<!-- ZH -->

## 快速使用

1. 在 app.pinecone.io 注册，复制 API key
2. `pip install "pinecone[assistant]"`
3. `pc.assistant.create_assistant(...)`，上传文件，调 `assistant.chat(messages=...)`

---

## 简介

Pinecone Assistant 是完全托管的 RAG 产品 —— 上传 PDF、Word、文本，得到一个带引用的聊天端点。Pinecone 帮你做切分、embedding、检索、prompt 构造、引用渲染。适合想要给文档做 RAG 又不想自己搭切分 + embedding + prompt 构造层的团队。兼容 Pinecone Python / Node SDK / REST API / Pinecone Console。装机时间 5 分钟。

---

### 建一个 assistant + 上传文件

```python
from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# 创建
assistant = pc.assistant.create_assistant(
    assistant_name="acme-docs",
    instructions="You are an Acme product support assistant. Cite sources.",
)

# 上传文件
assistant.upload_file(file_path="./manual.pdf")
assistant.upload_file(file_path="./faq.md")
assistant.upload_file(file_path="./troubleshooting.docx")
```

Pinecone 自动切分每个文档、embedding chunk、存进隐藏的向量索引、索引 metadata。

### 带引用的聊天

```python
from pinecone_plugins.assistant.models.chat import Message

messages = [Message(role="user", content="How do I reset the device?")]
response = assistant.chat(messages=messages, model="claude-3-5-sonnet")

print(response.message.content)
# "To reset the device, hold the power button for 10 seconds [1]. After the
#  light blinks blue, release. The device will return to factory settings [2]."

for citation in response.citations:
    print(citation.references[0].file.name, citation.references[0].pages)
# manual.pdf [page 12]
# manual.pdf [page 13]
```

### 流式响应

```python
for chunk in assistant.chat_stream(messages=messages):
    print(chunk.message.content, end="", flush=True)
```

### 按 metadata 过滤检索

```python
# 上传时打 tag
assistant.upload_file(
    file_path="./internal-only.pdf",
    metadata={"audience": "internal", "version": "2.0"},
)

# 查询时过滤
response = assistant.chat(
    messages=messages,
    filter={"audience": {"$eq": "public"}},
)
```

### 什么时候用 Assistant vs 自己撸

| 用 Assistant | 自己撸 |
|---|---|
| 想 1 小时内 RAG 跑起来 | 需要完全控制切分策略 |
| Pinecone 切分够用 | 专业文档（法律、医疗） |
| 几百 MB 文档 | TB 级语料 |
| 要开箱即用的引用回答 | 自定义 prompt + 引用格式 |

---

### FAQ

**Q: Pinecone Assistant 免费吗？**
A: 有免费档（2 个 assistant，查询数限制）。付费档加更多查询和存储。底层 LLM（Claude / GPT）成本由 Pinecone 计费，比直接用略加价。

**Q: Assistant 能用哪些 LLM？**
A: GPT-4o、Claude 3.5 Sonnet 和 Pinecone 持续加的其他模型。聊天时通过 `model=` 选择。Pinecone 处理 API key + 路由。

**Q: 跟自建 RAG 加 Pinecone 索引啥区别？**
A: 自建 RAG：你自己搭切分、embedding、检索、prompt 构造、引用。Assistant：Pinecone 都搭好，露一个 `chat()` 端点。80% 的用例 Assistant 上手更快；长尾自定义需求自建。

---

## 来源与感谢

> Built by [Pinecone](https://github.com/pinecone-io). Commercial product with free tier.
>
> [docs.pinecone.io/assistant](https://docs.pinecone.io/guides/assistant) — Assistant docs


---
Source: https://tokrepo.com/en/workflows/pinecone-assistant-managed-rag-service-with-auto-indexing
Author: Pinecone