# Pinecone Assistant — Managed RAG Service with Auto-Indexing > Pinecone Assistant is the fully managed RAG product on Pinecone. Upload PDFs, query with natural language, get cited answers — no chunking pipeline. ## Install Copy the content below into your project: ## Quick Use 1. Sign up at app.pinecone.io → copy API key 2. `pip install "pinecone[assistant]"` 3. `pc.assistant.create_assistant(...)`, upload files, call `assistant.chat(messages=...)` --- ## Intro Pinecone Assistant is the fully managed RAG product — upload PDFs, Word docs, or text, and get a chat endpoint that answers with citations. Pinecone handles chunking, embedding, retrieval, prompt construction, and citation rendering. Best for: teams who want RAG over their docs without building chunking + embedding + prompt-construction layers themselves. Works with: Pinecone Python / Node SDK, REST API, Pinecone Console. Setup time: 5 minutes. --- ### Create an assistant + upload files ```python from pinecone import Pinecone pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"]) # Create assistant = pc.assistant.create_assistant( assistant_name="acme-docs", instructions="You are an Acme product support assistant. Cite sources.", ) # Upload files assistant.upload_file(file_path="./manual.pdf") assistant.upload_file(file_path="./faq.md") assistant.upload_file(file_path="./troubleshooting.docx") ``` Pinecone chunks each document, embeds the chunks, stores them in a hidden vector index, and indexes the metadata. ### Chat with citations ```python from pinecone_plugins.assistant.models.chat import Message messages = [Message(role="user", content="How do I reset the device?")] response = assistant.chat(messages=messages, model="claude-3-5-sonnet") print(response.message.content) # "To reset the device, hold the power button for 10 seconds [1]. After the # light blinks blue, release. The device will return to factory settings [2]." for citation in response.citations: print(citation.references[0].file.name, citation.references[0].pages) # manual.pdf [page 12] # manual.pdf [page 13] ``` ### Streaming responses ```python for chunk in assistant.chat_stream(messages=messages): print(chunk.message.content, end="", flush=True) ``` ### Filter retrieval by metadata ```python # Tag files at upload assistant.upload_file( file_path="./internal-only.pdf", metadata={"audience": "internal", "version": "2.0"}, ) # Filter at query time response = assistant.chat( messages=messages, filter={"audience": {"$eq": "public"}}, ) ``` ### When to use Assistant vs roll-your-own | Use Assistant | Roll your own | |---|---| | Want RAG working in 1 hour | Need full control of chunking strategy | | OK with Pinecone's chunking | Specialized doc types (legal, medical) | | Few hundred MB of docs | TB-scale corpora | | Need cited answers out of the box | Custom prompt + citation format | --- ### FAQ **Q: Is Pinecone Assistant free?** A: There's a free tier (2 assistants, limited queries). Paid plans bundle more queries and storage. Underlying LLM (Claude / GPT) costs are billed by Pinecone with a small markup over direct usage. **Q: Which LLMs can the Assistant use?** A: GPT-4o, Claude 3.5 Sonnet, and other models Pinecone keeps adding. You pick at chat time via `model=`. Pinecone handles the API key + routing. **Q: How does this differ from a custom RAG with Pinecone Index?** A: Custom RAG: you build chunking, embedding, retrieval, prompt construction, citations. Assistant: Pinecone builds them and exposes a single `chat()` endpoint. For 80% of use cases, Assistant is faster to ship; for the long tail of custom needs, build it yourself. --- ## Source & Thanks > Built by [Pinecone](https://github.com/pinecone-io). Commercial product with free tier. > > [docs.pinecone.io/assistant](https://docs.pinecone.io/guides/assistant) — Assistant docs --- ## 快速使用 1. 在 app.pinecone.io 注册,复制 API key 2. `pip install "pinecone[assistant]"` 3. `pc.assistant.create_assistant(...)`,上传文件,调 `assistant.chat(messages=...)` --- ## 简介 Pinecone Assistant 是完全托管的 RAG 产品 —— 上传 PDF、Word、文本,得到一个带引用的聊天端点。Pinecone 帮你做切分、embedding、检索、prompt 构造、引用渲染。适合想要给文档做 RAG 又不想自己搭切分 + embedding + prompt 构造层的团队。兼容 Pinecone Python / Node SDK / REST API / Pinecone Console。装机时间 5 分钟。 --- ### 建一个 assistant + 上传文件 ```python from pinecone import Pinecone pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"]) # 创建 assistant = pc.assistant.create_assistant( assistant_name="acme-docs", instructions="You are an Acme product support assistant. Cite sources.", ) # 上传文件 assistant.upload_file(file_path="./manual.pdf") assistant.upload_file(file_path="./faq.md") assistant.upload_file(file_path="./troubleshooting.docx") ``` Pinecone 自动切分每个文档、embedding chunk、存进隐藏的向量索引、索引 metadata。 ### 带引用的聊天 ```python from pinecone_plugins.assistant.models.chat import Message messages = [Message(role="user", content="How do I reset the device?")] response = assistant.chat(messages=messages, model="claude-3-5-sonnet") print(response.message.content) # "To reset the device, hold the power button for 10 seconds [1]. After the # light blinks blue, release. The device will return to factory settings [2]." for citation in response.citations: print(citation.references[0].file.name, citation.references[0].pages) # manual.pdf [page 12] # manual.pdf [page 13] ``` ### 流式响应 ```python for chunk in assistant.chat_stream(messages=messages): print(chunk.message.content, end="", flush=True) ``` ### 按 metadata 过滤检索 ```python # 上传时打 tag assistant.upload_file( file_path="./internal-only.pdf", metadata={"audience": "internal", "version": "2.0"}, ) # 查询时过滤 response = assistant.chat( messages=messages, filter={"audience": {"$eq": "public"}}, ) ``` ### 什么时候用 Assistant vs 自己撸 | 用 Assistant | 自己撸 | |---|---| | 想 1 小时内 RAG 跑起来 | 需要完全控制切分策略 | | Pinecone 切分够用 | 专业文档(法律、医疗) | | 几百 MB 文档 | TB 级语料 | | 要开箱即用的引用回答 | 自定义 prompt + 引用格式 | --- ### FAQ **Q: Pinecone Assistant 免费吗?** A: 有免费档(2 个 assistant,查询数限制)。付费档加更多查询和存储。底层 LLM(Claude / GPT)成本由 Pinecone 计费,比直接用略加价。 **Q: Assistant 能用哪些 LLM?** A: GPT-4o、Claude 3.5 Sonnet 和 Pinecone 持续加的其他模型。聊天时通过 `model=` 选择。Pinecone 处理 API key + 路由。 **Q: 跟自建 RAG 加 Pinecone 索引啥区别?** A: 自建 RAG:你自己搭切分、embedding、检索、prompt 构造、引用。Assistant:Pinecone 都搭好,露一个 `chat()` 端点。80% 的用例 Assistant 上手更快;长尾自定义需求自建。 --- ## 来源与感谢 > Built by [Pinecone](https://github.com/pinecone-io). Commercial product with free tier. > > [docs.pinecone.io/assistant](https://docs.pinecone.io/guides/assistant) — Assistant docs --- Source: https://tokrepo.com/en/workflows/pinecone-assistant-managed-rag-service-with-auto-indexing Author: Pinecone